idnits 2.17.1 

draft-ietf-psamp-sample-tech-11.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 22.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 1927.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1895.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1902.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1908.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 14 instances of lines with non-RFC6890-compliant IPv4
     addresses in the document.  If these are example addresses, they should
     be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == Line 2038 has weird spacing: '...ong int  ub4; ...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 9, 2008) is 5768 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'PSAMP-TECH' is mentioned on line 173, but not defined

  -- Looks like a reference, but probably isn't: '1' on line 2127

  == Missing Reference: 'N' is mentioned on line 674, but not defined

  == Missing Reference: 'RCF2205' is mentioned on line 870, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 2152

  -- Looks like a reference, but probably isn't: '00' on line 1532

  == Missing Reference: 'FF' is mentioned on line 1532, but not defined

  -- Looks like a reference, but probably isn't: '2' on line 2127

  -- Looks like a reference, but probably isn't: '4' on line 2148

  -- Looks like a reference, but probably isn't: '5' on line 2129

  -- Looks like a reference, but probably isn't: '6' on line 2129

  -- Looks like a reference, but probably isn't: '8' on line 2131

  -- Looks like a reference, but probably isn't: '9' on line 2131

  == Unused Reference: 'RFC1624' is defined on line 1794, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2205' is defined on line 1797, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 5102
     (Obsoleted by RFC 7012)

  -- Obsolete informational reference (is this intentional?): RFC 5101
     (Obsoleted by RFC 7011)

  == Outdated reference: A later version (-13) exists of
     draft-ietf-psamp-framework-11

  == Outdated reference: A later version (-11) exists of
     draft-ietf-psamp-info-06

  == Outdated reference: A later version (-09) exists of
     draft-ietf-psamp-protocol-07

  -- Duplicate reference: RFC1624, mentioned in 'RFC1624', was also mentioned
     in 'RFC1141'.


     Summary: 1 error (**), 0 flaws (~~), 12 warnings (==), 20 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	 Internet Draft                                               T. Zseby
3	 Document: <draft-ietf-psamp-sample-tech-11.txt>      Fraunhofer FOKUS
4	 Intended status: Proposed Standard                          M. Molina
5	 Expires: December 2008                                          DANTE
6	                                                           N. Duffield
7	                                                    AT&T Labs-Research
8	                                                          S. Niccolini
9	                                                       NEC Europe Ltd.
10	                                                            F. Raspall
11	                                                              EPSC-UPC
12	                                                          July 9, 2008

14	    Sampling and Filtering Techniques for IP Packet Selection

16	 Status of this Memo

18	    By submitting this Internet-Draft, each author represents that
19	    any applicable patent or other IPR claims of which he or she is
20	    aware have been or will be disclosed, and any of which he or she
21	    becomes aware will be disclosed, in accordance with Section 6 of
22	    BCP 79.

24	    Internet-Drafts are working documents of the Internet
25	    Engineering Task Force (IETF), its areas, and its working
26	    groups.  Note that other groups may also distribute working
27	    documents as Internet-Drafts.

29	    Internet-Drafts are draft documents valid for a maximum of six
30	    months and may be updated, replaced, or obsoleted by other
31	    documents at any time.  It is inappropriate to use Internet-
32	    Drafts as reference material or to cite them other than as "work
33	    in progress."

35	    The list of current Internet-Drafts can be accessed at
36	    http://www.ietf.org/ietf/1id-abstracts.txt.

38	    The list of Internet-Draft Shadow Directories can be accessed at
39	    http://www.ietf.org/shadow.html.

41	    This Internet-Draft will expire on December, 2008.

43	 Copyright Notice

45	    Copyright (C) The IETF Trust (2008).

47	 Abstract

49	    This document describes Sampling and Filtering techniques for IP
50	    packet selection. It provides a categorization of schemes and
51	    defines what parameters are needed to describe the most common
52	    selection schemes. Furthermore it shows how techniques can be
53	    combined to build more elaborate packet Selectors. The document
54	    provides the basis for the definition of information models for
55	    configuring selection techniques in Metering Processes and for
56	    reporting the technique in use to a Collector.

58	 Conventions used in this document

60	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
61	    NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
62	    "OPTIONAL" in this document are to be interpreted as described
63	    in RFC 2119 [RFC2119].

65	 Table of Contents

67	    1.   Introduction................................................. 4
68	    2.   PSAMP Documents Overview..................................... 5
69	    3.   Terminology.................................................. 5
70	    3.1     Observation Points, Packet Streams and Packet Content..... 5
71	    3.2     Selection Process......................................... 6
72	    3.3     Reporting................................................. 8
73	    3.4     Metering Process ......................................... 8
74	    3.5     Exporting Process......................................... 8
75	    3.6     PSAMP Device.............................................. 9
76	    3.7     Collector................................................. 9
77	    3.8     Selection Methods......................................... 9
78	    4.   Categorization of Packet Selection Techniques............... 12
79	    5.   Sampling.................................................... 14
80	    5.1     Systematic Sampling...................................... 14
81	    5.2     Random Sampling.......................................... 15
82	    5.2.1   n-out-of-N Sampling...................................... 15
83	    5.2.2   Probabilistic Sampling................................... 16
84	    5.2.2.1 Uniform Probabilistic Sampling........................... 16
85	    5.2.2.2 Non-Uniform Probabilistic Sampling....................... 16
86	    5.2.2.3 Non-Uniform Flow State Dependent Sampling................ 16
87	    5.2.2.4 Configuration of non-uniform probabilistic and flow-
88	             state Sampling.......................................... 17
89	    6.   Filtering................................................... 17
90	    6.1     Property Match Filtering................................. 18
91	    6.2     Hash-based Filtering..................................... 20
92	    6.2.1   Application Examples for Coordinated Packet Selection ... 21
93	    6.2.1.1 Trajectory Sampling...................................... 21
94	    6.2.1.2 Passive One-way Measurements............................. 21
95	    6.2.1.3 Generation of Pseudo-random Numbers...................... 22
96	    6.2.2   Desired Properties of Hash Functions..................... 22
97	    6.2.2.1 Requirements for Packet Selection........................ 23
98	    6.2.2.2 Requirements for Packet Digesting........................ 23
99	    6.2.3   Security Considerations for Hash Functions............... 24
100	    6.2.3.1 Vulnerabilities of Hash-based selection without
101	             knowledge of selection outcomes......................... 25
102	    6.2.3.2 Vulnerabilities of Hash-based selection using knowledge
103	             of selection outcomes................................... 26
104	    6.2.3.3 Vulnerabilities to Replay Attacks........................ 27
105	    6.2.4   Choice of Hash-Function.................................. 27
106	    6.2.4.1 Hash Functions for Packet Selection...................... 28
107	    6.2.4.2 Hash Functions Suitable for Packet Digesting............. 30
108	    7.   Parameters for the Description of Selection Techniques...... 30
109	    7.1     Description of Sampling Techniques....................... 31
110	    7.2     Description of Filtering Techniques...................... 32
111	    8.   Composite Techniques........................................ 34
112	    8.1     Cascaded Filtering->Sampling or Sampling->Filtering...... 35
113	    8.2     Stratified Sampling...................................... 35
114	    9.   Security Considerations..................................... 36
115	    10.  Acknowledgements............................................ 37
116	    11.  IANA Considerations......................................... 37
117	    12.  Normative References........................................ 37
118	    13.  Informative References...................................... 37
119	    14.  Authors' Addresses.......................................... 40
120	    15.  Contributors................................................ 41
121	    16.  Intellectual Property Statement............................. 41
122	    17.  Copyright Statement......................................... 42
123	    18.  Disclaimer.................................................. 42
124	    Appendix A: Hash Functions....................................... 42
125	    A.1 IP Shift-XOR (IPSX) Hash Function............................ 42
126	    A.2 BOB Hash Function............................................ 43

128	 1. Introduction

130	    There are two main drivers for the growth in measurement
131	    infrastructures and their underlying technology. First, network
132	    data rates are increasing, with a concomitant growth in
133	    measurement data. Secondly, the growth is compounded by the
134	    demand of measurement-based applications for increasingly fine
135	    grained traffic measurements. Devices such as routers, which
136	    perform the measurements, require increasingly sophisticated and
137	    resource intensive measurement capabilities, including the
138	    capture of packet headers or even parts of the payload, and
139	    classification for flow analysis. All these factors can lead to
140	    an overwhelming amount of measurement data, resulting in high
141	    demands on resources for measurement, storage, transfer and post
142	    processing.

144	    The sustained capture of network traffic at line rate can be
145	    performed by specialized measurement hardware. However, the cost
146	    of the hardware and the measurement infrastructure required to
147	    accommodate the measurements preclude this as a ubiquitous
148	    approach. Instead some form of data reduction at the point of
149	    measurement is necessary.
150	    This can be achieved by an intelligent packet selection through
151	    Sampling or Filtering. Another way to reduce the amount of data
152	    is to use aggregation techniques (not addressed in this
153	    document). The motivation for Sampling is to select a
154	    representative subset of packets that allow accurate estimates
155	    of properties of the unsampled traffic to be formed. The
156	    motivation for Filtering is to remove all packets that are not
157	    of interest. Aggregation combines data and allows compact pre-
158	    defined views of the traffic. Examples of applications that
159	    benefit from packet selection are given in [PSAMP-FW].
160	    Aggregation techniques are out of scope of this document.

162	 2. PSAMP Documents Overview

164	    This document is one out of a series of documents from the PSAMP
165	    group.

167	    [PSAMP-FW]:   "A Framework for Packet Selection and Reporting"
168	                   describes the PSAMP framework for network elements
169	                   to select subsets of packets by statistical and
170	                   other methods, and to export a stream of reports
171	                   on the selected packets to a Collector.

173	    [PSAMP-TECH]: "Sampling and Filtering Techniques for IP Packet
174	                   Selection" (this document) describes the set of
175	                   packet selection techniques supported by PSAMP.

177	    [PSAMP-PROTO]: "Packet Sampling (PSAMP) Protocol Specifications"
178	                   specifies the export of packet information from a
179	                   PSAMP Exporting Process to a PSAMP Collecting
180	                   Process.

182	    [PSAMP-INFO]: "Information Model for Packet Sampling Exports"
183	                   defines an information and data model for PSAMP.

185	 3. Terminology

187	    The PSAMP terminology defined here is fully consistent with all
188	    terms listed in [PSAMP-FW] but includes additional terms
189	    required for the description of packet selection methods. An
190	    architecture overview and possible configurations of PSAMP
191	    elements can be found in [PSAMP-FW]. PSAMP terminology also aims
192	    at consistency with terms used in [RFC3917]. The relationship
193	    between PSAMP and IPFIX terms is described in [PSAMP-FW].

195	    In the PSAMP documents all defined PSAMP terms are written
196	    capitalized. This document uses the same convention.

198	 3.1 Observation Points, Packet Streams and Packet Content

200	    * Observation Point

202	       An Observation Point is a location in the network where
203	       packets can be observed. Examples include:

205	         (i)  A line to which a probe is attached;
206	         (ii) a shared medium, such as an Ethernet-based LAN;

208	         (iii) a single port of a router, or set of interfaces
209	               (physical or logical) of a router;

211	         (iv) an embedded measurement subsystem within an interface.

213	       Note that one Observation Point may be a superset of several
214	       other Observation Points.  For example one Observation Point
215	       can be an entire line card.  This would be the superset of the
216	       individual Observation Points at the line card's interfaces.

218	    * Observed Packet Stream

220	       The Observed Packet Stream is the set of all packets observed
221	       at the Observation Point.

223	    * Packet Stream

225	       A packet stream denotes a set of packets that flows past some
226	       specified point within the metering process. An example of a
227	       Packet Stream is the output of the selection process.
228	       Note that packets selected from a stream, e.g. by Sampling, do
229	       not necessarily possess a property by which they can be
230	       distinguished from packets that have not been selected. For
231	       this reason the term "stream" is favored over "flow", which is
232	       defined as set of packets with common properties [RFC3917].

234	    * Packet Content

236	       The packet content denotes the union of the packet header
237	       (which includes link layer, network layer and other
238	       encapsulation headers) and the packet payload. At some
239	       Observation Points the link header information may not be
240	       available.

242	 3.2 Selection Process

244	    * Selection Process

246	       A Selection Process takes the Observed Packet Stream as its
247	       input and selects a subset of that stream as its output.

249	    * Selection State

251	       A Selection Process may maintain state information for use by
252	       the Selection Process. At a given time, the Selection State
253	       may depend on packets observed at and before that time, and
254	       other variables. Examples include:

256	         (i)  sequence numbers of packets at the input of Selectors;

258	         (ii) a timestamp of observation of the packet at the
259	               Observation Point;

261	         (iii) iterators for pseudo-random number generators;

263	         (iv) hash values calculated during selection;

265	         (v)  indicators of whether the packet was selected by a
266	               given Selector;

268	       Selection Processes may change portions of the Selection State
269	       as a result of processing a packet. Selection state for a
270	       packet is to reflect the state after processing the packet.

272	    * Selector

274	       A Selector defines the action of a Selection Process on a
275	       single packet of its input. If selected, the packet becomes an
276	       element of the output Packet Stream.

278	       The Selector can make use of the following information in
279	       determining whether a packet is selected:

281	         (i)  the packet's content;

283	         (ii) information derived from the packet's treatment at the
284	               Observation Point;

286	         (iii) any selection state that may be maintained by the
287	               Selection Process.

289	    * Composite Selector

291	       A Composite Selector is an ordered composition of Selectors,
292	       in which the output Packet Stream issuing from one Selector
293	       forms the input Packet Stream to the succeeding Selector.

295	    * Primitive Selector

297	       A Selector is primitive if it is not a Composite Selector.

299	    * Selection Sequence
300	       From all the packets observed at an Observation Point, only a
301	       few packets are selected by one or more Selectors.  The
302	       Selection Sequence is a unique value per Observation Domain
303	       describing the Observation Point and the Selector IDs through
304	       which the packets are selected.

306	 3.3 Reporting

308	    * Packet Reports

310	       Packet Reports comprise a configurable subset of a packet's
311	       input to the Selection Process, including the packet's
312	       content, information relating to its treatment (for example,
313	       the output interface), and its associated selection state (for
314	       example, a hash of the packet's content)

316	    * Report Interpretation:

318	       Report Interpretation comprises subsidiary information,
319	       relating to one or more packets, that is used for
320	       interpretation of their packet reports. Examples include
321	       configuration parameters of the Selection Process.

323	    * Report Stream:

325	       The Report Stream is the output of a Metering Process,
326	       comprising two distinguished types of information: Packet
327	       Reports, and Report Interpretation.

329	 3.4 Metering Process

331	       A Metering Process selects packets from the Observed Packet
332	       Stream using a Selection Process, and produces as output a
333	       Report Stream concerning the selected packets. The PSAMP
334	       Metering Process can be viewed as analogous to the IPFIX
335	       metering process [RFC5101], which produces flow records as its
336	       output.  While the Metering Process definition in this
337	       document specifies the PSAMP definition, the PSAMP protocol
338	       specifications [PSAMP-PROTO] will use the IPFIX Metering
339	       Process definition, which also suits the PSAMP requirements.
340	       The relationship between PSAMP and IPFIX is described more in
341	       [PSAMP-INFO] and [PSAMP-PROTO].

343	 3.5 Exporting Process

345	    * Exporting Process:

347	       An Exporting Process sends, in the form of Export Packet, the
348	       output of one or more Metering Processes to one or more
349	       Collectors.

351	    * Export Packet:

353	       An Export Packet is a combination of Report Interpretation
354	       and/or one or more Packet Reports are bundled by the Exporting
355	       Process into an Export Packet for exporting to a Collector.

357	 3.6 PSAMP Device

359	    * PSAMP Device

361	       A PSAMP Device is a device hosting at least an Observation
362	       Point, a Metering Process (which includes a Selection Process)
363	       and an Exporting Process.  Typically, corresponding
364	       Observation Point(s), Metering Process(es) and Exporting
365	       Process(es) are co-located at this device, for example at a
366	       router.

368	 3.7 Collector

370	    * Collector

372	       A Collector receives a Report Stream exported by one or more
373	       Exporting Processes. In some cases, the host of the Metering
374	       and/or Exporting Processes may also serve as the Collector.

376	 3.8 Selection Methods

378	    * Filtering
379	       A filter is a Selector that selects a packet deterministically
380	       based on the Packet Content, or its treatment, or functions of
381	       these occurring in the Selection State.  Two examples are:

383	         (i) Property match filtering: a packet is selected if a
384	               specific field in the packet equals a predefined
385	               value.

387	         (ii) Hash-based selection: a hash function is applied to
388	               the Packet Content, and the packet is selected if the
389	               result falls in a specified range.

391	    * Sampling

393	       A selector that is not a filter is called a sampling
394	       operation.  This reflects the intuitive notion that if the
395	       selection of a packet cannot be determined from its content
396	       alone, there must be some type of sampling taking place.
397	       Sampling operations can be divided into two subtypes:

399	          (i) Content-independent sampling, which does not use
400	               Packet Content in reaching sampling decisions.
401	               Examples include systematic sampling, and uniform
402	               pseudo-random sampling driven by a pseudo-random
403	               number whose generation is independent of Packet
404	               Content.  Note that in Content-independent Sampling it
405	               is not necessary to access the Packet Content in order
406	               to make the selection decision.

408	         (ii) Content-dependent sampling, in which the Packet
409	               Content is used in reaching selection decisions.  An
410	               application is pseudo-random selection according to a
411	               probability that depends on the contents of a packet
412	               field, e.g., sampling packets with a probability
413	               dependent on their TCP/UDP port numbers.  Note that
414	               this is not a Filter.

416	    * Hash Domain

418	       A subset of the Packet Content and the packet treatment,
419	       viewed as an N-bit string for some positive integer N.

421	    * Hash Range

423	       A set of M-bit strings for some positive integer M that define
424	       the range of values the result of the hash operation can take.

426	    * Hash Function

428	       A Hash Function defines a deterministic mapping from the Hash
429	       Domain into the Hash Range.

431	    * Hash Selection Range

433	       The Hash Selection Range is a subset of the Hash Range. The
434	       packet is selected if the action of the Hash Function on the
435	       Hash Domain for the packet yields a result in the Hash
436	       Selection Range.

438	    * Hash-based Selection

440	       Hash-based Selection is a Filtering specified by a Hash
441	       Domain, a Hash Function, and Hash Range and a Hash Selection
442	       Range.

444	    * Approximative Selection

446	       Selectors in any of the above categories may be approximated
447	       by operations in the same or another category for the purposes
448	       of implementation. For example, uniform pseudo-random Sampling
449	       may be approximated by Hash-based Selection, using a suitable
450	       Hash Function and Hash Domain. In this case, the closeness of
451	       the approximation depends on the choice of Hash Function and
452	       Hash Domain.

454	    * Population

456	       A Population is a Packet Stream, or a subset of a Packet
457	       Stream. A Population can be considered as a base set from
458	       which packets are selected. An example is all packets in the
459	       Observed Packet Stream that are observed within some specified
460	       time interval.

462	    * Population Size

464	       The Population Size is the number of all packets in the
465	       Population.

467	    * Sample Size

469	       The number of packets selected from the Population by a
470	       Selector.

472	    * Configured Selection Fraction

474	       The Configured Selection Fraction is the ratio of the number
475	       of packets selected by a Selector from an input Population, to
476	       the Population Size, as based on the configured selection
477	       parameters.

479	    * Attained Selection Fraction

481	       The Attained Selection Fraction is the actual ratio of the
482	       number of packets selected by a Selector from an input
483	       Population, to the Population Size.

485	    For some sampling methods the Attained Selection Fraction can
486	    differ from the Configured Selection Fraction due to, for
487	    example, the inherent statistical variability in sampling
488	    decisions of probabilistic Sampling and Hash-based Selection.
489	    Nevertheless, for large Population Sizes and properly configured
490	    Selectors, the Attained Selection Fraction usually approaches
491	    the Configured Selection Fraction.

493	 4. Categorization of Packet Selection Techniques

495	    Packet selection techniques generate a subset of packets from an
496	    Observed Packet Stream at an Observation Point. We distinguish
497	    between Sampling and Filtering.

499	    Sampling is targeted at the selection of a representative subset
500	    of packets. The subset is used to infer knowledge about the
501	    whole set of observed packets without processing them all. The
502	    selection can depend on packet position, and/or on packet
503	    content, and/or on (pseudo) random decisions.

505	    Filtering selects a subset with common properties. This is used
506	    if only a subset of packets is of interest. The properties can
507	    be directly derived from the packet content, or depend on the
508	    treatment given by the router to the packet. Filtering is a
509	    deterministic operation. It depends on packet content or router
510	    treatment. It never depends on packet position or on (pseudo)
511	    random decisions.

513	    Note that a common technique to select packets is to compute a
514	    Hash Function on some bits of the packet header and/or content
515	    and to select it if the Hash Value falls in the Hash Selection
516	    Range. Since hashing is a deterministic operation on the packet
517	    content, it is a Filtering technique according to our
518	    categorization. Nevertheless, Hash Functions are sometimes used
519	    to emulate random Sampling. Depending on the chosen input bits,
520	    the Hash Function and the Hash Selection Range, this technique
521	    can be used to emulate the random selection of packets with a
522	    given probability p. It is also a powerful technique to
523	    consistently select the same packet subset at multiple
524	    Observation Points [DuGr00]

526	    The following table gives an overview of the schemes described
527	    in this document and their categorization. An X in brackets (X)
528	    denotes schemes for which also content-independent variants
529	    exist. It easily can be seen that only schemes with both
530	    properties, content dependence and deterministic selection, are
531	    considered as filters.

533	           Selection Scheme   | Deterministic | Content- | Category
534	                              |  Selection    | dependent|
535	      ------------------------+---------------+----------+----------
536	       Systematic             |       X       |     _    | Sampling
537	       Count-based            |               |          |
538	      ------------------------+---------------+----------+----------
539	       Systematic             |       X       |     -    | Sampling
540	       Time-based             |               |          |
541	      ------------------------+---------------+----------+----------
542	       Random                 |       -       |     -    | Sampling
543	       n-out-of-N             |               |          |
544	      ------------------------+---------------+----------+----------
545	       Random                 |       -       |     -    | Sampling
546	       Uniform probabilistic  |               |          |
547	      ------------------------+---------------+----------+----------
548	       Random                 |       -       |    (X)   | Sampling
549	       Non-uniform probabil.  |               |          |
550	      ------------------------+---------------+----------+----------
551	       Random                 |       -       |    (X)   | Sampling
552	       Non-uniform flow-state |               |          |
553	      ------------------------+---------------+----------+----------
554	       Property Match         |       X       |    (X)   | Filtering
555	       Filtering              |               |          |
556	      ------------------------+---------------+----------+----------
557	       Hash Function          |       X       |     X    | Filtering
558	      ------------------------+---------------+----------+----------

560	    In the table x means that the characteristic applies to the
561	    selection scheme and (x) means that the characteristic only
562	    partly applies to the selection scheme. For instance property
563	    match filtering is typically based on packet content and
564	    therefore content dependent. But as explained in section 6.1 it
565	    may also depend on router state and then would be independent of
566	    the content.

568	    The categorization just introduced is mainly useful for the
569	    definition of an information model describing Primitive
570	    Selectors. More complex selection techniques can be described
571	    through the composition of cascaded Sampling and Filtering
572	    operations. For example, a packet selection that weights the
573	    selection probability on the basis of the packet length can be
574	    described as a cascade of a Filtering and a Sampling scheme.
575	    However, this descriptive approach is not intended to be rigid:
576	    if a common and consolidated selection practice turns out to be
577	    too complex to be described as a composition of the mentioned
578	    building blocks, an ad hoc description can be specified instead
579	    and added as a new scheme to the information model.

581	 5. Sampling

583	    The deployment of Sampling techniques aims at the provisioning
584	    of information about a specific characteristic of the parent
585	    population at a lower cost than a full census would demand. In
586	    order to plan a suitable Sampling strategy it is therefore
587	    crucial to determine the needed type of information and the
588	    desired degree of accuracy in advance.

590	    First of all, it is important to know the type of metric that
591	    should be estimated. The metric of interest can range from
592	    simple packet counts [JePP92] up to the estimation of whole
593	    distributions of flow characteristics (e.g. packet
594	    sizes)[ClPB93].

596	    Secondly, the required accuracy of the information and with
597	    this, the confidence that is aimed at, should be known in
598	    advance. For instance for usage-based accounting the required
599	    confidence for the estimation of packet counters can depend on
600	    the monetary value that corresponds to the transfer of one
601	    packet. That means that a higher confidence could be required
602	    for expensive packet flows (e.g. premium IP service) than for
603	    cheaper flows (e.g. best effort). The accuracy requirements for
604	    validating a previously agreed quality can also vary extremely
605	    with the customer demands. These requirements are usually
606	    determined by the service level agreement (SLA).

608	    The Sampling method and the parameters in use must be clearly
609	    communicated to all applications that use the measurement data.
610	    Only with this knowledge a correct interpretation of the
611	    measurement results can be ensured.

613	    Sampling methods can be characterized by the Sampling algorithm,
614	    the trigger type used for starting a Sampling interval and the
615	    length of the Sampling interval. These parameters are described
616	    here in detail. The Sampling algorithm describes the basic
617	    process for selection of samples. In accordance to [AmCa89] and
618	    [ClPB93] we define the following basic Sampling processes:

620	 5.1 Systematic Sampling

622	    Systematic Sampling describes the process of selecting the start
623	    points and the duration of the selection intervals according to
624	    a deterministic function. This can be for instance the periodic
625	    selection of every k-th element of a trace but also the
626	    selection of all packets that arrive at pre-defined points in
627	    time. Even if the selection process does not follow a periodic
628	    function (e.g. if the time between the Sampling intervals varies
629	    over time) we consider this as systematic Sampling as long as
630	    the selection is deterministic.

632	    The use of systematic Sampling always involves the risk of
633	    biasing the results. If the systematics in the Sampling process
634	    resemble systematics in the observed stochastic process
635	    (occurrence of the characteristic of interest in the network),
636	    there is a high probability that the estimation will be biased.
637	    Systematics in the observed process might not be known in
638	    advance.

640	    Here only equally spaced schemes are considered, where triggers
641	    for Sampling are periodic, either in time or in packet count.
642	    All packets occurring in a selection interval (either in time or
643	    packet count) beyond the trigger are selected.

645	    Systematic count-based
646	    In systematic count-based Sampling the start and stop triggers
647	    for the Sampling interval are defined in accordance to the
648	    spatial packet position (packet count).

650	    Systematic time-based
651	    In systematic time-based Sampling time-based start and stop
652	    triggers are used to define the Sampling intervals. All packets
653	    are selected that arrive at the Observation Point within the
654	    time-intervals defined by the start and stop triggers (i.e.
655	    arrival time of the packet is larger than the start time and
656	    smaller than the stop time).

658	    Both schemes are content-independent selection schemes. Content
659	    dependent deterministic Selectors are categorized as filter.

661	 5.2 Random Sampling

663	    Random Sampling selects the starting points of the Sampling
664	    intervals in accordance to a random process. The selection of
665	    elements are independent experiments. With this, unbiased
666	    estimations can be achieved. In contrast to systematic Sampling,
667	    random Sampling requires the generation of random numbers. One
668	    can differentiate two methods of random Sampling:

670	 5.2.1   n-out-of-N Sampling

672	    In n-out-of-N Sampling n elements are selected out of the parent
673	    population that consists of N elements. One example would be to
674	    generate n different random numbers in the range [1,N] and
675	    select all packets which have a packet position equal to one of
676	    the random numbers. For this kind of Sampling the Sample Size n
677	    is fixed.

679	 5.2.2   Probabilistic Sampling

681	    In probabilistic Sampling the decision whether an element is
682	    selected or not is made in accordance to a pre-defined selection
683	    probability. An example would be to flip a coin for each packet
684	    and select all packets for which the coin showed the head. For
685	    this kind of Sampling the Sample Size can vary for different
686	    trials. The selection probability does not necessarily has to be
687	    the same for each packet. Therefore we distinguish between
688	    uniform probabilistic Sampling (with the same selection
689	    probability for all packets) and non-uniform probabilistic
690	    Sampling (where the selection probability can vary for different
691	    packets).

693	 5.2.2.1 Uniform Probabilistic Sampling

695	    For Uniform Probabilistic Sampling packets are selected
696	    independently with a uniform probability p. This Sampling can be
697	    count-driven, and is sometimes referred to as geometric random
698	    Sampling, since the difference in count between successive
699	    selected packets are independent random variables with a
700	    geometric distribution of mean 1/p. A time-driven analog,
701	    exponential random Sampling, has the time between triggers
702	    exponentially distributed.
703	    Both geometric and exponential random Sampling are examples of
704	    what is known as additive random Sampling, defined as Sampling
705	    where the intervals or counts between successive samples are
706	    independent identically distributed random variable.

708	 5.2.2.2 Non-Uniform Probabilistic Sampling

710	    This is a variant of Probabilistic Sampling in which the
711	    Sampling probabilities can depend on the selection process
712	    input. This can be used to weight Sampling probabilities in
713	    order e.g. to boost the chance of Sampling packets that are rare
714	    but are deemed important. Unbiased estimators for quantitative
715	    statistics are recovered by re-normalization of sample values;
716	    see [HT52].

718	 5.2.2.3 Non-Uniform Flow State Dependent Sampling

720	    Another type of Sampling that can be classified as probabilistic
721	    Non-Uniform is closely related to the flow concept as defined in
722	    [RFC3917], and it is only used jointly with a flow monitoring
723	    function (IPFIX metering process). Packets are selected,
724	    dependent on a selection state. The point, here, is that the
725	    selection state is determined also by the state of the flow the
726	    packet belongs to and/or by the state of the other flows
727	    currently being monitored by the associated flow monitoring
728	    function. An example for such an algorithm is the "sample and
729	    hold" method described in [EsVa01]:

731	    - If a packet accounts for a flow record that already exists in
732	       the IPFIX flow recording process, it is selected (i.e. the
733	       flow record is updated)
734	    - If a packet doesn't account to any existing flow record, it is
735	       selected with probability p. If it has been selected a new
736	       flow record has to be created.

738	    A further algorithm that fits into the category of non-uniform
739	    flow state dependent Sampling is described in [Moli03].

741	    This type of Sampling is content dependent because the
742	    identification of the flow the packet belongs to requires
743	    analyzing part of the packet content. If the packet is selected,
744	    then it is passed as an input to the IPFIX monitoring function
745	    (this is called "Local Export" in [PSAMP-FW]. Selecting the
746	    packet depending on the state of a flow cache is useful when
747	    memory resources of the flow monitoring function are scarce
748	    (i.e. there is no room to keep all the flows that have been
749	    scheduled for monitoring).

751	 5.2.2.4 Configuration of non-uniform probabilistic and flow-state
752	       Sampling

754	    Many different specific methods can be grouped under the terms
755	    non-uniform probabilistic and flow state Sampling. Dependent on
756	    the Sampling goal and the implemented scheme, a different number
757	    and type of input parameters is required to configure such
758	    scheme.

760	    Some concrete proposals for such methods exist from the research
761	    community (e.g. [EsVa01],[DuLT01],[Moli03]). Some of these
762	    proposals are still in an early stage and need further
763	    investigations to prove their usefulness and applicability. It
764	    is not our aim to indicate preference amongst these methods.
765	    Instead, we only describe here the basic methods and leave the
766	    specification of explicit schemes and their parameters up to
767	    vendors (e.g. as extension of the information model).

769	 6. Filtering
770	    Filtering is the deterministic selection of packets based on the
771	    packet content, the treatment of the packet at the Observation
772	    Point, or deterministic functions of these occurring in the
773	    selection state. The packet is selected if these quantities fall
774	    into a specified range. The role of Filtering, as the word
775	    itself suggest, is to separate all the packets having a certain
776	    property from those not having it. A distinguishing
777	    characteristic from Sampling is that the selection decision does
778	    not depend on the packet position in time or in the space, or on
779	    a random process.
780	    We identify and describe in the following two Filtering
781	    techniques.

783	 6.1 Property Match Filtering

785	    With this Filtering method a packet is selected if specific
786	    fields within the packet and/or properties of the router state
787	    equal a predefined value. Possible filter fields are all IPFIX
788	    flow attributes specified in [RFC5102]. Further fields can be
789	    defined by proposing new information elements or defining vendor
790	    specific extensions.

792	    A packet is selected if Field=Value. Masks and ranges are only
793	    supported to the extent to which [RFC5102] allows them e.g. by
794	    providing explicit fields like the netmasks for source and
795	    destination addresses.

797	    AND operations are possible by concatenating filters, thus
798	    producing a composite selection operation.  In this case, the
799	    ordering in which the filtering happens is implicitly defined
800	    (outer filters come after inner filters).  However, as long as
801	    the concatenation is on filters only, the result of the cascaded
802	    filter is independent from the order, but the order may be
803	    important for implementation purposes, as the first filter will
804	    have to work at a higher rate.  In any case, an implementation
805	    is not constrained to respect the filter ordering, as long as
806	    the result is the same, and it may even implement the composite
807	    filtering in filtering in one single step.

809	    OR operations are not supported with this basic model.  More
810	    sophisticated filters (e.g. supporting bitmasks, ranges or OR
811	    operations etc.) can be realized as vendor specific schemes.

813	    All IPFIX flow attributes defined in [RFC5102] can be used for
814	    property match filtering. Further information elements can be
815	    easily defined. Typical header fields that should be supported
816	    for property match operations are the following:

818	          (i) the IP header (excluding options in IPv4, stacked
819	               headers in IPv6)

821	         (ii) transport protocol header (e.g. TCP, UDP)

823	         (iii) encapsulation headers (e.g. the MPLS label stack, if
824	               present)

826	    When the PSAMP Device offers property match filtering, and, in
827	    its usual capacity other than in performing PSAMP functions,
828	    identifies or processes information from IP, transport protocol
829	    or encapsulation protocols, then the information should be made
830	    available for filtering.  For example, when a PSAMP Device
831	    routes based on destination IP address, that field should be
832	    made available for filtering.  Conversely, a PSAMP Device that
833	    does not route is not expected to be able to locate an IP
834	    address within a packet, or make it available for Filtering,
835	    although it may do so.

837	    Since packet encryption conceals the real values of encrypted
838	    fields, property match filtering must be configurable to ignore
839	    encrypted packets, when detected.

841	    The Selection Process may support filtering based on the
842	    properties of the router state:

844	         (i)  Ingress interface at which packet arrives equals a
845	               specified value

847	         (ii) Egress interface to which packet is routed to equals a
848	               specified value

850	         (iii) Packet violated Access Control List (ACL) on the
851	               router

853	         (iv)  Failed Reverse Path Forwarding (RPF)

855	         (v)  Failed Resource Reservation (RSVP)

857	         (vi)  No route found for the packet

859	         (vii) Origin Border Gateway Protocol (BGP) Autonomous System
860	               (AS) [RFC4271] equals a specified value or lies within
861	               a given range
862	         (viii)Destination BGP AS equals a specified value or lies
863	               within a given range

865	    Packets that match the Failed Reverse Path Forwarding (RPF)
866	    condition are packets for which ingress filtering failed as
867	    defined in [RFC3704].
868	    Packets that match the Failed Resource Reservation condition are
869	    packets that do not fulfill the RSVP specification as defined in
870	    [RCF2205].

872	    Router architectural considerations may preclude some
873	    information concerning the packet treatment being available at
874	    line rate for selection of packets.  For example, the Selection
875	    Process may not be implemented in the fast path that is able to
876	    access routing state at line rate.  However, when filtering
877	    follows sampling (or some other selection operation) in a
878	    Composite Selector, the rate of the Packet Stream output from
879	    the sampler and input to the filter may be sufficiently slow
880	    that the filter could select based on routing state.

882	 6.2 Hash-based Filtering

884	    A Hash Function h maps the Packet Content c, or some portion of
885	    it, onto a Hash Range R. The packet is selected if h(c) is an
886	    element of S, which is a subset of R called the Hash Selection
887	    Range. Thus Hash-based Selection is a particular case of
888	    Filtering. The object is selected if c is in inv(h(S)). But for
889	    desirable Hash Functions the inverse image inv(h(S)) will be
890	    extremely complex, and hence h would not be expressible as, say,
891	    a Property Match Filter or a simple combination of these.

893	    Hash-based selection is mainly used to realize a coordinated
894	    packet selection. That means that the same packets are selected
895	    at different Observation Points. This is useful for instance to
896	    observe the path (trajectory) that a packet took through the
897	    network or to apply packet selection to passive one-way
898	    measurements.

900	    A pre-requisite for the method to work and to ensure
901	    interoperability is that the same Hash Function with the same
902	    parameters (e.g. input vector) is used at the observation
903	    points.

905	    A consistent packet selection is also possible with property
906	    match filtering. Nevertheless, hash-based selection can be used
907	    to approximate a random selection. The desired statistical
908	    properties are discussed in section 6.2.2.

910	    In the following subsections we give some application examples
911	    for coordinated packet selection.

913	 6.2.1   Application Examples for Coordinated Packet Selection

915	 6.2.1.1 Trajectory Sampling

917	    Trajectory Sampling is the consistent selection of a subset of
918	    packets at either all of a set of Observation Points or none of
919	    them. Trajectory Sampling is realized by Hash-based Selection if
920	    all Observation Points in the set use a common Hash Function,
921	    Hash Domain and selection range. The Hash Domain comprises all
922	    or part of the packet content that is invariant along the packet
923	    path. Fields such as Time-to-Live, which is decremented per hop,
924	    and header CRC, which is recalculated per hop, are thus excluded
925	    from the Hash Domain. The Hash Domain needs to be wider than
926	    just a flow key, if packets are to be selected quasi-randomly
927	    within flows.

929	    The trajectory (or path) followed by a packet is reconstructed
930	    from PSAMP reports on it that reach a Collector. Reports on a
931	    given packet originating from different observations points are
932	    associated by matching a label from the reports. The label may
933	    comprise that portion invariant packet content that is reported,
934	    or possibly some digest of the invariant packet content that is
935	    inserted into the packet report at the Observation Point. Such a
936	    digest may be constructed by applying a second Hash Function
937	    (distinct from that used for selection) to the invariant packet
938	    content. The reconstruction of trajectories, and methods for
939	    dealing with possible ambiguities due to label collisions
940	    (identical labels reported for different packets) and potential
941	    loss of reports in transmission, are dealt with in [DuGr00],
942	    [DuGG02] and [DuGr04].

944	    Applications of trajectory Sampling include (i) estimation of
945	    the network path matrix, i.e., the traffic intensities according
946	    to network path, broken down by flow key; (ii) detection of
947	    routing loops, as indicated by self-intersecting trajectories;
948	    (iii) passive performance measurement: prematurely terminating
949	    trajectories indicate packet loss, packet one way delay can be
950	    determined if reports include (synchronized) timestamps of
951	    packet arrival at the Observation Point; (iv) network attack
952	    tracing, of the actual paths taken by attack packets with
953	    spoofed source addresses.

955	 6.2.1.2 Passive One-way Measurements

957	    Coordinated packet selection can be applied for instance to one-
958	    way delay measurements in order to reduce the required
959	    resources. In one-way delay measurements packets are collected
960	    at different Observation Points in the network. A packet digest
961	    is generated for each packet that helps to identify the packet.
962	    The packet digest and the arrival time of the packet at the
963	    observation point are reported to a process that calculates the
964	    delay. The delay is calculated by subtracting the arrival time
965	    of the same packet at the observation points (e.g. [ZsZC01]).
966	    With high data rates, capturing all packets can require a lot of
967	    resources for storage, transfer and processing. To reduce
968	    resource consumption packet selection methods can be applied.
969	    But for such selection techniques it has to be ensured that the
970	    same packets are collected at different observation points.

972	 6.2.1.3 Generation of Pseudo-random Numbers

974	    Although pseudo-random number generators with well understood
975	    properties have been developed, they may not be the method of
976	    choice in settings where computational resources are scarce. A
977	    convenient alternative is to use Hash Functions of packet
978	    content as a source of randomness. The hash (suitably re-
979	    normalized) is a pseudo-random variate in the interval [0,1].
980	    Other schemes may use packet fields in iterators for pseudo-
981	    random numbers. However, the statistical properties of an ideal
982	    packet selection law (such as independent Sampling for different
983	    packets, or independence on packet content) may not be exactly
984	    rendered by an implementation, but only approximately so.

986	    Use of packet content to generate pseudo-random variates shares
987	    with Non-uniform Probabilistic Sampling (see Section 3.1.2.2.2
988	    above) the property that selection decisions depend on Packet
989	    Content. However, there is a fundamental difference between the
990	    two. In the former case the content determines pseudo-random
991	    variates. In the latter case the content only determines the
992	    selection probabilities: selection could then proceed e.g., by
993	    use of random variates obtained by an independent pseudo-random
994	    number generator.

996	 6.2.2   Desired Properties of Hash Functions

998	    Here we formulate desired properties for hash functions. For
999	    this we have to distinguish whether a hash function is used for
1000	    packet selection or just as a packet digest. The main purpose of
1001	    this document is on packet selection. Nevertheless, we also
1002	    provide some requirements for the use of hash functions as
1003	    packet digest.

1005	    First of all we need to define suitable input fields from the
1006	    packet. In accordance to [DuGr00] input field should be
1007	    - invariant on the path
1008	    - variable among packets

1010	    Only if the input fields are the same at different observation
1011	    points it is possible to recognize the packet. The input fields
1012	    should be variable among packets in order to distribute the hash
1013	    results over the Selection Range.

1015	 6.2.2.1 Requirements for Packet Selection

1017	    In accordance to considerations in [MoND05] and [Henk08] we
1018	    define the following desired properties of hash functions used
1019	    for packet selection:

1021	    (i) Speed: The hash function has to be applied to each packet
1022	    that traverses the observation point. Therefore it has to be
1023	    fast in order to cope with the high packet rates. In the ideal
1024	    case the hash operation should not influence the performance on
1025	    the PSAMP device.

1027	    (ii) Uniformity: The Hash Function h should have good mixing
1028	    properties, in the sense that small changes in the input (e.g.
1029	    the flipping of a single bit) cause large changes in the output
1030	    (many bits change). Then any local clump of values of c is
1031	    spread widely over R by h, and so the distribution of h(c) is
1032	    fairly uniform even if the distribution of c is not. Then the
1033	    Sampling Fraction is #S/#R, which can be tuned by choice of S.

1035	    (iii) Unbiasedness: The selection decision should be as
1036	    independent of packet attributes as possible. The set of
1037	    selected packets should not be biased towards a specific type of
1038	    packets.

1040	    (iv) Representativeness of sample: The sample should be as
1041	    representative as possible for the observed traffic.

1043	    (v) Non-linearity: The function should not be linear. This
1044	    increases the mixing properties (uniformity criterion). In
1045	    addition to this it decreases the predictability of the output
1046	    and therefore the vulnerabilities against attacks.

1048	    (vi) Robustness against vulnerabilities: The hash function
1049	    should be robust against attacks. Potential vulnerabilities are
1050	    described in section 6.2.3.

1052	 6.2.2.2 Requirements for Packet Digesting
1053	    For digesting Packet Content for inclusion in a reported label,
1054	    the most important property is a low collision frequency. A
1055	    secondary requirement is the ability to accept variable length
1056	    input, in order to allow inclusion of maximal amount of packet
1057	    as input. Execution speed is of secondary importance, since the
1058	    digest need only be formed from selected packets.

1060	 6.2.3   Security Considerations for Hash Functions

1062	    A concern for Hash-based Selection is whether some large set of
1063	    related packets could be disproportionately sampled, i.e., that
1064	    the Attained Sampling Fraction is significantly different from
1065	    the Configured Sampling Fraction. This can happen either

1067	    (i)  through unanticipated behavior in the Hash Function, or

1069	    (ii) because the packets had been deliberately crafted to have
1070	       this property.

1072	    The first point underlines the importance of using a Hash
1073	    Function with good mixing properties. For this the statistical
1074	    properties of candidate Hash Functions need to be evaluated.
1075	    Since the hash output depends on the traffic mix, the evaluation
1076	    should be done preferably on up-to-date packet traces from the
1077	    network in which the hash-based selection will be deployed.

1079	    However, hash functions which perform well on typical traffic
1080	    may not be sufficiently strong to withstand attacks specifically
1081	    targeted against them. Such potential attacks have been
1082	    described in [GoRe07].

1084	    The following we point out different potential attack scenarios.
1085	    We encourage the use of standardized hash functions. Therefore
1086	    we assume that the hash function itself is public and hence
1087	    known to an attacker.
1088	    Nevertheless, we also assume the possibility of using a private
1089	    input parameter for the hash function that is kept secret. Such
1090	    an input parameter can for instance be attached to the hash
1091	    input before the hash operation is applied. With this at least
1092	    parts of the hash operation remains secret.

1094	    For the attack scenarios we assume that an attacker uses its
1095	    knowledge of the hash function to craft packets which are then
1096	    dispatched, either as the attack itself, or to elicit further
1097	    information which can be used to refine the attack.

1099	    Two scenarios are considered. In the first scenario, the
1100	    attacker has no knowledge about whether the crafted packets are
1101	    selected or not. In the second scenario the attacker uses some
1102	    knowledge of sampling outcomes. The means by which this might be
1103	    acquired is discussed below. Some additional attacks that
1104	    involve tampering with export packets in transit, as opposed to
1105	    attacking the PSAMP device, are discussed in [GoRe07].

1107	 6.2.3.1 Vulnerabilities of Hash-based selection without knowledge
1108	       of selection outcomes

1110	    (i) The hash function does not use a private parameter.

1112	    If no private input parameter is used, potential attackers can
1113	    easily calculate which packets result in which hash values.
1114	    If the selection range is public, an attacker can craft packets
1115	    whose selection properties are known in advance. If the
1116	    selection range is private, an attacker cannot determine whether
1117	    a crafted packet is selected. However by computing the hash on
1118	    different trial crafted packets, and selecting those yielding a
1119	    given hash value, the attacker can construct an arbitrarily
1120	    large set of distinct packets with a common selection
1121	    properties, i.e., packets that will be either all selected or
1122	    all not selected. This can be done whatever the strength of the
1123	    hash function.

1125	    (ii) The hash function is not cryptographically strong.

1127	    If the hash function is not cryptographically strong, it may be
1128	    possible to construct sequences of distinct packets with the
1129	    common selection property even if a private parameter is used.

1131	    An example is the standard CRC-32 hash function used with a
1132	    private modulus (but without a private string post-pended to the
1133	    input). It has weak mixing properties for low order bits.
1134	    Consequently, simply by incrementing the hash input, one obtains
1135	    distinct packets whose hashes mostly fall in a narrow range, and
1136	    hence are likely commonly selected; see [GoRe07]

1138	    Suitable parameterization of the hash function can make such
1139	    attacks more difficult. For example, post-pending a private
1140	    string to the input before hashing with CRC-32 will give
1141	    stronger mixing properties over all bits of the input. However,
1142	    with a hash function, such as CRC-32, that is not
1143	    cryptographically strong, the possibility of discovering a
1144	    method to construct packet sets with the common selected
1145	    property cannot be ruled out, even when a private modulus or
1146	    post-pended string is used.

1148	 6.2.3.2 Vulnerabilities of Hash-based selection using knowledge of
1149	       selection outcomes

1151	    Knowledge of the selection outcomes of crafted packets can be
1152	    used by an attacker to more easily construct sets of packets
1153	    which are disproportionately sampled and/or are commonly
1154	    selected. For this the attacker does not need any a priori
1155	    knowledge about the hash function or selection range.

1157	    There are several ways an attacker might acquire this knowledge
1158	    about the selection outcome:

1160	    (i) Billing Reports: if samples are used for billing purposes,
1161	    then the selection outcomes of packets may be able to be
1162	    inferred by correlating a crafted packet stream with the billing
1163	    reports that it generates. However, the rate at knowledge of
1164	    selection outcomes can be acquired depends on the temporal and
1165	    spatial granularity of the billing reports, being slower the
1166	    more aggregated the reports are.

1168	    (ii) Feedback from an Intrusion Detection System: e.g., a
1169	    botmaster adversary learns if his packets were detected by the
1170	    intrusion detection system by seeing if one of his bots is
1171	    blocked by the network.

1173	    (iii) Observation of the Report Stream: export packets sent
1174	    across a public network may be eavesdropped on by an adversary.
1175	    Encryption of the export packets provides only a partial
1176	    defense, since it may be possible to infer the selection
1177	    outcomes of packets by correlating a crafted packet stream with
1178	    the occurrence (not the content) of packets in the export stream
1179	    that it generates. The rate at which such knowledge could be
1180	    acquired is limited by the temporal resolution at which reports
1181	    can be associated with packets, e.g. due to processing and
1182	    propagation variability, and difficulty in distinguishing report
1183	    on attack packets from those of background traffic, if present.
1184	    The association between packets and their reports on which this
1185	    depends could be removed by padding export packets to a constant
1186	    length and sending them at a constant rate.

1188	    We now turn to attacks that can exploit knowledge of selection
1189	    outcomes. Firstly, with a non-cryptographic hash function,
1190	    knowledge of selection outcomes for a trial stream may be used
1191	    to further craft a packet set with the common selection
1192	    property. This has been demonstrated for the modular hash f(x) =
1193	    a x + b mod k, for private parameters a, b, and k. With sampling
1194	    rate p, knowledge of the sampling outcomes of roughly 2/p is
1195	    sufficient for the attack to succeed, independent of the values
1196	    of a, b and k. With knowledge of the selection outcomes of a
1197	    larger number of packets, the parameters a b and k can be
1198	    determined; see [GoRe07].

1200	    A cryptographic hash function employing a private parameter and
1201	    operating in one of the pseudo-random function modes specified
1202	    above is not vulnerable to these attacks, even if the selection
1203	    range is known.

1205	 6.2.3.3 Vulnerabilities to Replay Attacks

1207	    Since hash-based selection is deterministic, any packet or set
1208	    of packets with known selection properties can be replayed into
1209	    a network and experience the same selection outcomes provide the
1210	    hash function and its parameters are not changed. Repetition of
1211	    a single packet may be noticeable to other measurement methods
1212	    if employed (e.g. collection of flow statistics), whereas a set
1213	    of distinct packets that appears statistically similar to
1214	    regular traffic may be less noticeable.

1216	    Replay attacks may be mitigated by repeated changing of hash
1217	    function parameters. This also prevents attacks that exploit
1218	    knowledge of sampling outcomes, at least if the parameters are
1219	    changed at least as fast as the knowledge can be acquired by an
1220	    attacker. In order to preserve the ability to perform Trajectory
1221	    Sampling, parameter changed would have to be simultaneous (or
1222	    approximately so) across all observation point.

1224	 6.2.4   Choice of Hash-Function

1226	    The specific choice of hash function represents a trade-off
1227	    between complexity and ease of implementation. Ideally, a
1228	    cryptographically strong hash function employing a private
1229	    parameter and operating in pseudo-random function mode as
1230	    specified above would be used, yielding a good emulation a
1231	    random packet selection at a target sampling rate, and giving
1232	    maximal robustness against the attacks described in the previous
1233	    section. Unfortunately there is currently no single hash
1234	    function that fulfills all the requirements.

1236	    As detailed in section 6.2.3, only cryptographic hash functions
1237	    employing a private parameter operating in pseudo-random
1238	    function mode are sufficiently strong to withstand the range of
1239	    conceivable attacks. For example, fixed or variable length
1240	    inputs could be hashed using a block cipher (like AES) in
1241	    cipher-block-chaining mode.  Fixed length inputs could also be
1242	    hashed using an iterated cryptographic hash function (like MD5
1243	    or SHA1), with a private initial vector.  For variable length
1244	    inputs, iterated cryptographic hash function (like MD5 or SHA1)
1245	    should employ private string post-pended to the data in addition
1246	    to a private initial vector. For more details, see the "append-
1247	    cascade" construction of [BeCK96]. We encourage the use of such
1248	    cryptographically strong hash function wherever possible.

1250	    However, a problem with using such function is the low
1251	    performance. As shown for instance in [Henk08], the computation
1252	    time for MD5 and SHA are about 7-10 times higher compared to
1253	    non-cryptographic functions. The difference increases for small
1254	    hash input lengths.

1256	    Therefore it is not assumed that all PSAMP devices will be
1257	    capable of applying a cryptographically strong hash function to
1258	    every packet at line rate. For this reason, the hash functions
1259	    listed in this section will be of a weaker variety. Future
1260	    protocol extensions that employ stronger hash functions are
1261	    highly welcome.

1263	    Comparisons of hash-functions for packet selection and packet
1264	    digesting with regard to various criteria can be found in
1265	    [MoND05] and [Henk08].

1267	 6.2.4.1 Hash Functions for Packet Selection

1269	    If hash-based packet selection is applied, the BOB function MUST
1270	    be used for packet selection operations in order to be compliant
1271	    with PSAMP. The specification of BOB is given in the appendix.
1272	    Both the parameter (the init value) and the selection range
1273	    should be kept private. The initial vector of the hash function
1274	    MUST be configurable out of band to prevent security breaches
1275	    like exposure of the initial vector content.

1277	    Other functions, such as CRC-32 and IPSX MAY be used.  The IPSX
1278	    function is described in the appendix, the CRC-32 function is
1279	    described in [RFC1141]. If CRC-32 is used, the input should
1280	    first be post-pended with a private string that acts as a
1281	    parameter, and the modulus of the CRC should also be kept
1282	    private.

1284	    IPSX is simple to implement and was correspondingly about an
1285	    order of magnitude faster to execute per packet than BOB or CRC-
1286	    32 [MoND05].

1288	    All three hash functions evaluated showed relatively poor
1289	    uniformity with 16 byte input that was drawn from only invariant
1290	    fields in the IP and TCP/UDP headers (i.e. header fields that do
1291	    not change from hop to hop). IPSX is inherently limited to 16
1292	    bytes.
1293	    BOB and CRC-32 exhibits noticeably better uniformity when 4 or
1294	    more bytes from the payload are also included in the input
1295	    [MoND05].  Also with other criteria BOB performed quite well
1296	    [Henk08]

1298	    Although the characteristics have been checked for different
1299	    traffic traces, results cannot be generalized to arbitrary
1300	    traffic. Since hash-based selection is a deterministic function
1301	    on the packet content, it can always be biased towards packets
1302	    with specific attributes. Furthermore, it should be noted that
1303	    all Hash Functions were evaluated only for IPv4.

1305	    None of these hash functions is recommended for cryptographic
1306	    purposes. Please also note that the use of a private parameter
1307	    only slightly reduces the vulnerabilities against attacks. As
1308	    shown in section 6.2.3. functions that are not cryptographically
1309	    strong (e.g., BOB and CRC) cannot prevent attackers from
1310	    crafting packets that are disproportionally selected even if a
1311	    private parameter is used and the selection range is kept
1312	    secret.

1314	    As described in section 6.2.2 the input bytes for the Hash
1315	    Function need to be invariant along the path the packet is
1316	    traveling. Only with this it is ensured that the same packets
1317	    are selected at different observation points. Furthermore they
1318	    should have a high variability between different packets to
1319	    generate a high variation in the Hash Range. An evaluation of
1320	    the variability of different packet header fields can be found
1321	    in [DuGr00], [HeSZ08] and [Henk08].

1323	    If a hash-based selection with the BOB function is used with
1324	    IPv4 traffic, the following input bytes MUST be used.
1325	    - IP identification field
1326	    - Flags field
1327	    - Fragment offset
1328	    - Source IP address
1329	    - Destination IP address
1330	    - A configurable number of bytes from the IP payload, starting
1331	       at a configurable offset.

1333	    Due to the lack of suitable IPv6 packet traces, all candidate
1334	    Hash Functions in [DuGr00], [MoND05] and [Henk08] were evaluated
1335	    only for IPv4. Due to the IPv6 header fields and address
1336	    structure it is expected that there is less randomness in IPv6
1337	    packet headers than in IPv4 headers. Nevertheless, the
1338	    randomness of IPv6 traffic has not yet been evaluated
1339	    sufficiently to get any evidence. In addition to this, IPv6
1340	    traffic profiles may change significantly in future when IPv6 is
1341	    used by a broader community.

1343	    If a hash-based selection with the BOB function is used with
1344	    IPv6 traffic, the following input bytes MUST be used.
1345	    - Payload length (2 bytes)
1346	    - Byte number 10,11,14,15,16 of the IPv6 source address
1347	    - Byte number 10,11,14,15,16 of the IPv6 destination address
1348	    - A configurable number of bytes from the IP payload, starting
1349	       at a configurable offset. It is recommended to use at least 4
1350	       bytes from the IP payload.

1352	    The payload itself is not changing during the path. Even if some
1353	    routers process some extension headers they are not going to
1354	    strip them from the packet. Therefore the payload length is
1355	    invariant along the path. Furthermore it usually differs for
1356	    different packets. The IPv6 address has 16 bytes. The first part
1357	    is the network part and it contains low variation. The second
1358	    part is the host part and contains higher variation. Therefore
1359	    the second part of the address is used. Nevertheless, the
1360	    uniformity has not been checked for IPv6 traffic.

1362	 6.2.4.2 Hash Functions Suitable for Packet Digesting

1364	    For this purpose also the BOB function SHOULD be used. Other
1365	    functions (such as CRC-32) MAY be used. Among the functions
1366	    capable of operating with variable length input BOB and CRC-32
1367	    have the fastest execution, BOB being slightly faster. IPSX is
1368	    not recommended for digesting because it has a significantly
1369	    higher collision rate and takes only a fixed length input.

1371	 7. Parameters for the Description of Selection Techniques

1373	    This section gives an overview of different alternative
1374	    selection schemes and their required parameters. In order to be
1375	    compliant with PSAMP at least one of proposed schemes MUST be
1376	    implemented.

1378	    The decision whether to select a packet or not is based on a
1379	    function which is performed when the packet arrives at the
1380	    selection process. Packet selection schemes differ in the input
1381	    parameters for the selection process and the functions they
1382	    require to do the packet selection. The following table gives an
1383	    overview.

1385	         Scheme       |   input parameters     |     functions
1386	       ---------------+------------------------+-------------------
1387	        systematic    |    packet position     |  packet counter
1388	        count-based   |    Sampling pattern    |
1389	       ---------------+------------------------+-------------------
1390	        systematic    |      arrival time      |  clock or timer
1391	        time-based    |     Sampling pattern   |
1392	       ---------------+------------------------+-------------------
1393	        random        |  packet position       |  packet counter,
1394	        n-out-of-N    |  Sampling pattern      |  random numbers
1395	                      | (random number list)   |
1396	       ---------------+------------------------+-------------------
1397	        uniform       |        Sampling        |  random function
1398	        probabilistic |      probability       |
1399	       ---------------+------------------------+-------------------
1400	        non-uniform   |e.g. packet position,   | selection function,
1401	        probabilistic |  packet content(parts) |  probability calc.
1402	       ---------------+------------------------+-------------------
1403	        non-uniform   |e.g. flow state,        | selection function,
1404	        flow-state    |  packet content(parts) |  probability calc.
1405	       ---------------+------------------------+-------------------
1406	        property      | packet content(parts)  |  filter function or
1407	        match         | or router state        |  state discovery
1408	       ---------------+------------------------+-------------------
1409	        hash-based    |  packet content(parts) |  Hash Function
1410	       ---------------+------------------------+-------------------

1412	 7.1 Description of Sampling Techniques

1414	    In this section we define what elements are needed to describe
1415	    the most common Sampling techniques. Here the selection function
1416	    is pre-defined and given by the Selector ID.

1418	    Sampler Description:
1419	         SELECTOR_ID
1420	         SELECTOR_TYPE
1421	         SELECTOR_PARAMETERS

1423	    Where:

1425	    SELECTOR_ID:
1426	    Unique ID for the packet sampler.

1428	    SELECTOR_TYPE
1429	    For Sampling processes the SELECTOR TYPE defines what Sampling
1430	    algorithm is used.
1431	    Values: Systematic Count-based | Systematic Time-based | Random
1432	    n-out-of-N | Uniform Probabilistic | Non-uniform Probabilistic |
1433	    Non-uniform Flow-state

1435	    SELECTOR_PARAMETERS
1436	    For Sampling processes the SELECTOR PARAMETERS define the input
1437	    parameters for the process. Interval length in systematic
1438	    Sampling means, that all packets that arrive in this interval
1439	    are selected. The spacing parameter defines the spacing in time
1440	    or number of packets between the end of one Sampling interval
1441	    and the start of the next succeeding interval.

1443	    Case n out of N:
1444	       - Population size N, Sample size n

1446	    Case Systematic Time Based:
1447	       - Interval length (in usec), Spacing (in usec)

1449	    Case Systematic Count Based:
1450	       - Interval length(in packets), Spacing (in packets)

1452	    Case Uniform Probabilistic (with equal probability per packet):
1453	       - Sampling probability p

1455	    Case Non-uniform Probabilistic:
1456	       - Calculation function for Sampling probability p (see also
1457	          section 5.2.2.4)

1459	    Case flow state:
1460	       - Information reported for flow state sampling are not
1461	          defined in this document (see also section 5.2.2.4)

1463	 7.2 Description of Filtering Techniques

1465	    In this section we define what elements are needed to describe
1466	    the most common Filtering techniques. The structure closely
1467	    parallels the one presented for the Sampling techniques.

1469	    Filter Description:
1470	         SELECTOR_ID
1471	         SELECTOR_TYPE
1472	         SELECTOR_PARAMETERS

1474	    Where:

1476	    SELECTOR_ID:

1478	    Unique ID for the packet filter. The ID can be calculated under
1479	    consideration of the SELECTION SEQUENCE and a local ID.

1481	    SELECTOR_TYPE
1482	    For Filtering processes the SELECTOR TYPE defines what Filtering
1483	    type is used.
1484	    Values: Matching | Hashing | Router_state

1486	    SELECTOR_PARAMETERS
1487	    For Filtering processes the SELECTOR PARAMETERS define formally
1488	    the common property of the packet being filtered. For the
1489	    filters of type Matching and Hashing the definitions have a lot
1490	    of points in common.

1492	    Values:

1494	    Case Matching
1495	       - Information Element (from [RFC5102])
1496	       - Value (type in accordance to [RFC5102])

1498	    In case of multiple match criteria, multiple "case matching"
1499	    have to be bound by a logical AND.

1501	    Case Hashing:
1502	       - Hash Domain (Input bits from packet)
1503	            - <Header type = IPv4>
1504	            - <Input bit specification, header part>
1505	            - <Header type =  IPv6>
1506	            - <Input bit specification, header part>
1507	            - <payload byte number N>
1508	            - <Input bit specification, payload part>
1509	       - Hash Function
1510	            - Hash function name
1511	            - Length of input key (eliminate 0x bytes)
1512	            - Output value (length M and bitmask)
1513	            - Hash Selection Range, as a list of non overlapping
1514	              intervals [start value, end value] where value is in
1515	              [0,2^M-1]
1516	            - Additional parameters dependent on specific Hash
1517	              Function (e.g. hash input bits (seed))

1519	    Notes to input bits for Case Hashing:
1520	       - Input bits can be from header part only, from the payload
1521	          part only or from both.
1522	       - The bit specification, for the header part, can be
1523	          specified for IPv4 or IPv6 only, or both

1525	       - In case of IPv4, the bit specification is a sequence of 20
1526	          Hexadecimal numbers [00,FF] specifying a 20 bytes bitmask
1527	          to be applied to the header.
1528	       - In case of IPv6, it is a sequence of 40 Hexadecimal numbers
1529	          [00,FF] specifying a 40 bytes bitmask to be applied to the
1530	          header
1531	       - The bit specification, for the payload part, is a sequence
1532	          of Hexadecimal numbers [00,FF] specifying the bitmask to be
1533	          applied to the first N bytes of the payload, as specified
1534	          by the previous field. In case the Hexadecimal number
1535	          sequence is longer than N, only the first N numbers are
1536	          considered.
1537	       - In case the payload is shorter than N, the Hash Function
1538	          cannot be applied. Other options, like padding with zeros,
1539	          may be considered in the future.
1540	       - A Hash Function cannot be defined on the options field of
1541	          the IPv4 header, neither on stacked headers of IPv6.
1542	       - The Hash Selection Range defines a range of hash-values
1543	          (out of all possible results of the Hash-Operation). If the
1544	          hash result for a specific packet falls in this range, the
1545	          packet is selected. If the value is outside the range, the
1546	          packet is not selected. E.g. if the selection interval
1547	          specification is [1:3], [6:9] all packets are selected for
1548	          which the hash result is 1,2,3,6,7,8, or 9. In all other
1549	          cases the packet is not selected.

1551	    Case Router State:

1553	       - Ingress interface at which the packet arrives equals a
1554	          specified value
1555	       - Egress interface to which the packet is routed equals a
1556	          specified value
1557	       - Packet violated Access Control List (ACL) on the router
1558	       - Reverse Path Forwarding (RPF) failed for the packet
1559	       - Resource Reservation is insufficient for the packet
1560	       - No route found for the packet
1561	       - Origin AS equals a specified value or lies within a given
1562	          range
1563	       - Destination AS equals a specified value or lies within a
1564	          given range

1566	    Note to Case Router State:
1567	       - All Router state entries can be linked by AND operators

1569	 8. Composite Techniques

1571	    Composite schemes are realized by combining the selector IDs
1572	    into a Selection Sequence. The Selection Sequence contains all
1573	    selector IDs that are applied to the packet stream subsequently.
1574	    Some examples of composite schemes are reported below.

1576	 8.1 Cascaded Filtering->Sampling or Sampling->Filtering

1578	    If a filter precedes a Sampling process the role of Filtering is
1579	    to create a set of "parent populations" from a single stream
1580	    that can then be fed independently to different Sampling
1581	    functions, with different parameters tuned for the population
1582	    itself (e.g. if streams of different intensity result from
1583	    Filtering, it may be good to have different Sampling rates). If
1584	    Filtering follows a Sampling process, the same Sampling Fraction
1585	    and type is applied to the whole stream, independently of the
1586	    relative size of the streams resulting from the Filtering
1587	    function. Moreover, also packets not destined to be selected in
1588	    the Filtering operation will "load" the Sampling function. So,
1589	    in principle, Filtering before Sampling allows a more accurate
1590	    tuning of the Sampling procedure, but if filters are too complex
1591	    to work at full line rate (e.g. because they have to access
1592	    router state information), Sampling before Filtering may be a
1593	    need.

1595	 8.2 Stratified Sampling

1597	    Stratified Sampling is one example for using a composite
1598	    technique. The basic idea behind stratified Sampling is to
1599	    increase the estimation accuracy by using a-priori information
1600	    about correlations of the investigated characteristic with some
1601	    other characteristic that is easier to obtain. The a-priori
1602	    information is used to perform an intelligent grouping of the
1603	    elements of the parent population. In this manner, a higher
1604	    estimation accuracy can be achieved with the same Sample Size or
1605	    the Sample Size can be reduced without reducing the estimation
1606	    accuracy.

1608	    Stratified Sampling divides the Sampling process into multiple
1609	    steps. First, the elements of the parent population are grouped
1610	    into subsets in accordance to a given characteristic. This
1611	    grouping can be done in multiple steps. Then samples are taken
1612	    from each subset.

1614	    The stronger the correlation between the characteristic used to
1615	    divide the parent population (stratification variable) and the
1616	    characteristic of interest (for which an estimate is sought
1617	    after), the easier is the consecutive Sampling process and the
1618	    higher is the stratification gain. For instance, if the dividing
1619	    characteristic were equal to the investigated characteristic,
1620	    each element of the sub-group would be a perfect representative
1621	    of that characteristic. In this case it would be sufficient to
1622	    take one arbitrary element out of each subgroup to get the
1623	    actual distribution of the characteristic in the parent
1624	    population. Therefore stratified Sampling can reduce the costs
1625	    for the Sampling process (i.e. the number of samples needed to
1626	    achieve a given level of confidence).

1628	    For stratified Sampling one has to specify classification rules
1629	    for grouping the elements into subgroups and the Sampling scheme
1630	    that is used within the subgroups. The classification rules can
1631	    be expressed by multiple filters. For the Sampling scheme within
1632	    the subgroups the parameters have to be specified as described
1633	    above. The use of stratified Sampling methods for measurement
1634	    purposes is described for instance in [ClPB93] and [Zseb03].

1636	 9. Security Considerations

1638	    Security considerations concerning the choice of sampling hash
1639	    function have been discussed in Section 6.2.2. That section
1640	    discussed a number of potential attacks to craft packet streams
1641	    which are disproportionately detected and/or discover the hash
1642	    function parameters, the vulnerabilities of different hash
1643	    functions to these attacks, and practices to minimize these
1644	    vulnerabilities.

1646	    In addition to this a user can gains knowledge about the start
1647	    and stop triggers in time-based systematic sampling e.g. by
1648	    sending test packets. This knowledge might allow users to modify
1649	    their send schedule in a way that their packets are
1650	    disproportionately selected or not selected [GoRe07].

1652	    For random sampling cryptographically-strong random number
1653	    generator should be used in order to prevent that an advisory
1654	    can predict the selection decision [GoRe07].

1656	    Further security threats can occur when sampling parameters are
1657	    configured or communicated to other entities. The configuration
1658	    and reporting of sampling parameters are out of scope of this
1659	    document. Therefore the security threats that originate from
1660	    this kind of communication cannot be assessed with the
1661	    information given in this document.

1663	    Some of these threats can probably be addressed by keeping
1664	    configuration information confidential and by authenticating
1665	    entities that configure sampling. Nevertheless a full analysis
1666	    and assessment of threats for configuration and reporting has to
1667	    be done if configuration or reporting methods are proposed.

1669	 10. Acknowledgements

1671	    We would like to thank the PSAMP group, especially Benoit Claise
1672	    and Stewart Bryant, for fruitful discussions and for
1673	    proofreading the document. We thank Sharon Goldberg for her
1674	    input on security issues concerning hash-based selection.

1676	 11. IANA Considerations

1678	    This document has no actions for IANA.

1680	 12. Normative References

1682	    [RFC2119]   Bradner, S., Key words for use in RFCs to Indicate
1683	                 Requirement Levels, BCP 14, RFC 2119, March 1997

1685	 13. Informative References

1687	    [AmCa89]    Paul D. Amer, Lillian N. Cassel, "Management of
1688	                 Sampled Real-Time Network Measurements", 14th
1689	                 Conference on Local Computer Networks, October
1690	                 1989, Minneapolis, pages 62-68, IEEE, 1989.

1692	    [BeCK96]    M. Bellare, R. Canetti and H. Krawczyk,
1693	                 "Pseudorandom Functions Revisited: The Cascade
1694	                 Construction and its Concrete Security", Symposium
1695	                 on Foundations of Computer Science, 1996.

1697	    [ClPB93]    K.C. Claffy, George C. Polyzos, Hans-Werner Braun,
1698	                 "Application of Sampling Methodologies to Network
1699	                 Traffic Characterization", Proceedings of ACM
1700	                 SIGCOMM'93, San Francisco, CA, USA, September 13 -
1701	                 17, 1993.

1703	    [DuGG02]    N.G. Duffield, A. Gerber, M. Grossglauser,
1704	                 "Trajectory Engine: A Backend for Trajectory
1705	                 Sampling", IEEE Network Operations and Management
1706	                 Symposium 2002, Florence, Italy, April 15-19, 2002.

1708	    [DuGr00]    N.G. Duffield, M. Grossglauser, "Trajectory
1709	                 Sampling for Direct Traffic Observation",
1710	                 Proceedings of ACM SIGCOMM 2000, Stockholm, Sweden,
1711	                 August 28 - September 1, 2000.

1713	    [DuGr04]    N. G. Duffield and M. Grossglauser "Trajectory
1714	                 Sampling with Unreliable Reporting", Proc IEEE
1715	                 Infocom 2004, Hong Kong, March 2004.

1717	    [DuLT01]    N.G. Duffield, C. Lund, and M. Thorup, "Charging
1718	                 from Sampled Network Usage", ACM Internet
1719	                 Measurement Workshop IMW 2001, San Francisco, USA,
1720	                 November 1-2, 2001.

1722	    [EsVa01]    C. Estan and G. Varghese, "New Directions in
1723	                 Traffic Measurement and Accounting", ACM SIGCOMM
1724	                 Internet Measurement Workshop 2001, San Francisco
1725	                 (CA) Nov. 2001.

1727	    [GoRe07]    S. Goldberg, J. Rexford, "Security Vulnerabilities
1728	                 and Solutions for Packet Sampling", IEEE Sarnoff
1729	                 Symposium, Princeton, NJ, May 2007.

1731	    [HT52]      D.G. Horvitz and D.J. Thompson, "A Generalization
1732	                 of Sampling without replacement from a Finite
1733	                 Universe" J. Amer. Statist. Assoc. Vol. 47, pp.
1734	                 663-685, 1952.

1736	    [Henk08]    Christian Henke, Evaluation of Hash Functions for
1737	                 Multipoint Sampling in IP Networks, Diploma Thesis,
1738	                 TU Berlin, April 2008.

1740	    [HeSZ08]    Christian Henke, Carsten Schmoll, Tanja Zseby,
1741	                 Evaluation of Header Field Entropy for Hash-Based
1742	                 Packet Selection, Proceedings of Passive and Active
1743	                 Measurement Conference PAM 2008, Cleveland, Ohio,
1744	                 USA, April 2008.

1746	    [RFC5102]   J. Quittek, S. Bryant, B. Claise, P. Aitken, J.
1747	                 Meyer, "Information Model for IP Flow Information
1748	                 Export", RFC 5102, January 2008.

1750	    [RFC5101]   B. Claise (Editor) "Specification of the IPFIX
1751	                 Protocol for the Exchange of IP Traffic Flow
1752	                 Information", RFC 5101, January 2008.

1754	    [Jenk97]    B. Jenkins, "Algorithm Alley", Dr. Dobb's Journal,
1755	                 September 1997.
1756	                 http://burtleburtle.net/bob/hash/doobs.html

1758	    [JePP92]    Jonathan Jedwab, Peter Phaal, Bob Pinna, "Traffic
1759	                 Estimation for the Largest Sources on a Network,
1760	                 Using Packet Sampling with Limited Storage", HP
1761	                 technical report, Managemenr, Mathematics and
1762	                 Security Department, HP Laboratories, Bristol,
1763	                 March 1992,
1764	                 http://www.hpl.hp.com/techreports/92/HPL-92-35.html

1766	    [Moli03]    M.Molina, "A scalable and efficient methodology for
1767	                 flow monitoring in the Internet", International
1768	                 Teletraffic Congress (ITC-18), Berlin, Sep. 2003

1770	    [MoND05]    M. Molina, S.Niccolini, N.G.Duffield "A Comparative
1771	                 Experimental Study of Hash Functions Applied to
1772	                 Packet Sampling" International Teletraffic Congress
1773	                 (ITC-19), Beijing, August 2005.

1775	    [PSAMP-FW]  Nick Duffield (Ed.), "A Framework for Packet
1776	                 Selection and Reporting", RFC XXXX [currently
1777	                 Internet Draft draft-ietf-psamp-framework-11, work
1778	                 in progress, May 2007].

1780	    [PSAMP-INFO] T. Dietz, F. Dressler, G. Carle, B. Claise,
1781	                 "Information Model for Packet Sampling Exports",
1782	                 RFC XXXX. [Currently Internet Draft, draft-ietf-
1783	                 psamp-info-06, June 2007]

1785	    [PSAMP-PROTO] B. Claise (Ed.), "Packet Sampling (PSAMP) Protocol
1786	                 Specifications", RFC XXXX. [Currently Internet
1787	                 Draft draft-ietf-psamp-protocol-07.txt, work in
1788	                 progress, October 2006].

1790	    [RFC1141]   T. Mallory, A. Kullberg, "Incremental Updating of
1791	                 the Internet Checksum", RFC 1141, January 1990
1792	                 (updated by RFC1624).

1794	    [RFC1624]   A. Rijsinghani, Computation of the Internet
1795	                 Checksum via Incremental Update, RFC1624, May 1994

1797	    [RFC2205]   R. Braden (Ed.), L. Zhang, S. Berson, S. Herzog, S.
1798	                 Jamin, Resource ReSerVation Protocol (RSVP) -
1799	                 Version 1 Functional Specification, RFC2205,
1800	                 September 1997

1802	    [RFC3704]   F. Baker, P. Savola, Ingress Filtering for
1803	                 Multihomed Networks, RFC3704, March 2004

1805	    [RFC3917]   J. Quittek, T. Zseby, B. Claise, S. Zander,
1806	                 "Requirements for IP Flow Information Export", RFC
1807	                 3917, October 2004.

1809	    [RFC4271]   Y. Rekhter, T. Li, S. Hares, "A Border Gateway
1810	                 Protocol 4 (BGP-4)", RFC 4271, January 2006.

1812	    [Zseb03]    T. Zseby, "Stratification Strategies for Sampling-
1813	                 based Non-intrusive Measurement of One-way Delay",
1814	                 Proceedings of Passive and Active Measurement
1815	                 Workshop (PAM 2003), La Jolla, CA, USA, pp. 171-
1816	                 179, April 2003.

1818	    [ZsZC01]    Tanja Zseby, Sebastian Zander, Georg Carle.
1819	                 Evaluation of Building Blocks for Passive One-way-
1820	                 delay Measurements. Proceedings of Passive and
1821	                 Active Measurement Workshop (PAM 2001), Amsterdam,
1822	                 The Netherlands, April 23-24, 2001.

1824	 14. Authors' Addresses

1826	    Tanja Zseby
1827	    Fraunhofer Institute for Open Communication Systems
1828	    Kaiserin-Augusta-Allee 31
1829	    10589 Berlin
1830	    Germany
1831	    Phone: +49-30-34 63 7153
1832	    Email: tanja.zseby@fokus.fraunhofer.de

1834	    Maurizio Molina
1835	    DANTE
1836	    City House
1837	    126-130 Hills Road
1838	    Cambridge CB21PQ
1839	    United Kingdom
1840	    Phone: +44 1223 371 300
1841	    Email: maurizio.molina@dante.org.uk

1843	    Nick Duffield
1844	    AT&T Labs - Research
1845	    Room B-139
1846	    180 Park Ave
1847	    Florham Park NJ 07932, USA
1848	    Phone: +1 973-360-8726
1849	    Email: duffield@research.att.com

1851	    Saverio Niccolini
1852	    Network Laboratories, NEC Europe Ltd.
1853	    Kurfuerstenanlage 36
1854	    69115 Heidelberg
1855	    Germany
1856	    Phone: +49-6221-9051118
1857	    Email:  saverio.niccolini@netlab.nec.de

1859	    Fredric Raspall
1860	    EPSC-UPC
1861	    Dept. of Telematics
1862	    Av. del Canal Olimpic, s/n
1863	    Edifici C4
1864	    E-08860 Castelldefels, Barcelona
1865	    Spain
1866	    Email: fredi@entel.upc.es

1868	 15. Contributors

1870	    Sharon Goldberg contributed to the security considerations
1871	    for hash-based selection.

1873	    Sharon Goldberg
1874	    Department of Electrical Engineering
1875	    Princeton University
1876	    F210-K EQuad
1877	    Princeton, NJ 08544, USA
1878	    Email: goldbe@princeton.edu

1880	 16. Intellectual Property Statement

1882	    The IETF has been notified of intellectual property rights
1883	    claimed in regard to some or all of the specification contained
1884	    in this document. For more information consult the online list
1885	    of claimed rights.

1887	    The IETF takes no position regarding the validity or scope of
1888	    any Intellectual Property Rights or other rights that might be
1889	    claimed to pertain to the implementation or use of the
1890	    technology described in this document or the extent to which any
1891	    license under such rights might or might not be available; nor
1892	    does it represent that it has made any independent effort to
1893	    identify any such rights.  Information on the procedures with
1894	    respect to rights in RFC documents can be found in BCP 78 and
1895	    BCP 79.

1897	    Copies of IPR disclosures made to the IETF Secretariat and any
1898	    assurances of licenses to be made available, or the result of an
1899	    attempt made to obtain a general license or permission for the
1900	    use of such proprietary rights by implementers or users of this
1901	    specification can be obtained from the IETF on-line IPR
1902	    repository at http://www.ietf.org/ipr.

1904	    The IETF invites any interested party to bring to its attention
1905	    any copyrights, patents or patent applications, or other
1906	    proprietary rights that may cover technology that may be
1907	    required to implement this standard. Please address the
1908	    information to the IETF at ietf-ipr@ietf.org.

1910	 17. Copyright Statement

1912	    Copyright (C) The IETF Trust (2008).

1914	    This document is subject to the rights, licenses and
1915	    restrictions contained in BCP 78, and except as set forth
1916	    therein, the authors retain all their rights.

1918	 18. Disclaimer

1920	    This document and the information contained herein are provided
1921	    on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
1922	    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY,
1923	    THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM
1924	    ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
1925	    ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
1926	    INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY
1927	    OR FITNESS FOR A PARTICULAR PURPOSE.

1929	 Appendix A: Hash Functions

1931	 A.1 IP Shift-XOR (IPSX) Hash Function

1933	    The IPSX Hash Function is tailored for acting on IP version 4
1934	    packets. It exploits the structure of IP packet and in
1935	    particular the variability expected to be exhibited within
1936	    different fields of the IP packet in order to furnish a hash
1937	    value with little apparent correlation with individual packet
1938	    fields. Fields from the IPv4 and TCP/UDP headers are used as
1939	    input. The IPSX Hash Function uses a small number of simple
1940	    instructions.

1942	    Input parameters: None

1944	    Built-in parameters: None

1946	    Output: The output of the IPSX is a 16 bit number

1948	    Functioning:

1950	    The functioning can be divided into two parts: input selection,
1951	    which forms are composite input from various portions of the IP
1952	    packet, followed by computation of the hash on the composite.

1954	    Input Selection:
1955	    The raw input is drawn from the first 20 bytes of the IP packet
1956	    header and the first 8 bytes of the IP payload. If IP options
1957	    are not used, the IP header has 20 bytes, and hence the two
1958	    portions adjoin and comprise the first 28 bytes of the IP
1959	    packet. We now use the raw input as 4 32-bit subportions of
1960	    these 28 bytes. We specify the input by bit offsets from the
1961	    start of IP header or payload.

1963	    f1 = bits 32 to 63 of the IP header, comprising the IP
1964	         identification field, flags, and fragment offset.

1966	    f2 = bits 96 to 127 of the IP header, the source IP address.

1968	    f3 = bits 128 to 159 of the IP header, the destination IP
1969	         address.

1971	    f4 = bits 32 to 63 of the IP payload. For a TCP packet, f4
1972	         comprises the TCP sequence number followed by the message
1973	         length. For a UDP packet f4 comprises the UDP checksum.

1975	    Hash Computation:
1976	    The hash is computed from f1, f2, f3 and f4 by a combination of
1977	    XOR (^), right shift (>>) and left shift (<<) operations. The
1978	    intermediate quantities h1, v1, v2 are 32-bit numbers.

1980	           1.    v1 = f1 ^ f2;
1981	           2.    v2 = f3 ^ f4;
1982	           3.    h1 = v1 << 8;
1983	           4.    h1 ^= v1 >> 4;
1984	           5.    h1 ^= v1 >> 12;
1985	           6.    h1 ^= v1 >> 16;
1986	           7.    h1 ^= v2 << 6;
1987	           8.    h1 ^= v2 << 10;
1988	           9.    h1 ^= v2 << 14;
1989	           10.   h1 ^= v2 >> 7;

1991	    The output of the hash is the least significant 16 bits of h1.

1993	 A.2 BOB Hash Function

1995	    The BOB Hash Function is a Hash Function designed for having
1996	    each bit of the input affecting every bit of the return value
1997	    and using both 1-bit and 2-bit deltas to achieve the so called
1998	    avalanche effect [Jenk97]. This function was originally built
1999	    for hash table lookup with fast software implementation.

2001	    Input Parameters:
2002	    The input parameters of such a function are:
2003	    - the length of the input string (key) to be hashed, in bytes.
2004	    The elementary input blocks of Bob hash are the single bytes,
2005	    therefore no padding is needed.
2006	    - an init value (an arbitrary 32-bit number).

2008	    Built in parameters:
2009	    The Bob Hash uses the following built-in parameter:
2010	    - the golden ratio (an arbitrary 32-bit number used in the hash
2011	    function computation: its purpose is to avoid mapping all zeros
2012	    to all zeros);

2014	    Note: the mix sub-function (see mix (a,b,c) macro in the
2015	    reference code in 3.2.4) has a number of parameters governing
2016	    the shifts in the registers. The one presented is not the only
2017	    possible choice.

2019	    It is an open point whether these may be considered additional
2020	    built-in parameters to specify at function configuration.

2022	    Output.
2023	    The output of the BOB function is a 32-bit number. It should be
2024	    specified:
2025	    - A 32 bit mask to apply to the output
2026	    - The selection range as a list of non overlapping intervals
2027	    [start value, end value] where value is in [0,2^32]

2029	    Functioning:
2030	    The hash value is obtained computing first an initialization of
2031	    an internal state (composed of 3 32-bit numbers, called a, b, c
2032	    in the reference code below), then, for each input byte of the
2033	    key the internal state is combined by addition and mixed using
2034	    the mix sub-function. Finally, the internal state mixed one last
2035	    time and the third number of the state (c) is chosen as the
2036	    return value.

2038	    typedef unsigned long int  ub4;   /* unsigned 4-byte quantities
2039	    */
2040	    typedef unsigned      char ub1;   /* unsigned 1-byte quantities
2041	    */

2043	    #define hashsize(n) ((ub4)1<<(n))
2044	    #define hashmask(n) (hashsize(n)-1)
2045	    /* ------------------------------------------------------
2046	      mix -- mix 3 32-bit values reversibly.
2047	      For every delta with one or two bits set, and the deltas of
2048	    all three high bits or all three low bits, whether the original
2049	    value of a,b,c is almost all zero or is uniformly distributed,
2050	      * If mix() is run forward or backward, at least 32 bits in
2051	    a,b,c have at least 1/4 probability of changing.
2052	      * If mix() is run forward, every bit of c will change between
2053	    1/3 and 2/3 of the time.  (Well, 22/100 and 78/100 for some 2-
2054	    bit deltas.) mix() was built out of 36 single-cycle latency
2055	    instructions in a structure that could supported 2x parallelism,
2056	    like so:
2057	            a -= b;
2058	            a -= c; x = (c>>13);
2059	            b -= c; a ^= x;
2060	            b -= a; x = (a<<8);
2061	            c -= a; b ^= x;
2062	            c -= b; x = (b>>13);
2063	            ...
2064	    Unfortunately, superscalar Pentiums and Sparcs can't take
2065	    advantage of that parallelism.  They've also turned some of
2066	    those single-cycle latency instructions into multi-cycle latency
2067	    instructions

2069	    ------------------------------------------------------------*/

2071	      #define mix(a,b,c)  \
2072	      { \
2073	        a -= b; a -= c; a ^= (c>>13); \
2074	        b -= c; b -= a; b ^= (a<<8); \
2075	        c -= a; c -= b; c ^= (b>>13); \
2076	        a -= b; a -= c; a ^= (c>>12);  \
2077	        b -= c; b -= a; b ^= (a<<16); \
2078	        c -= a; c -= b; c ^= (b>>5); \
2079	        a -= b; a -= c; a ^= (c>>3);  \
2080	        b -= c; b -= a; b ^= (a<<10); \
2081	        c -= a; c -= b; c ^= (b>>15); \
2082	      }

2084	      /* -----------------------------------------------------------
2085	    hash() -- hash a variable-length key into a 32-bit value
2086	    k       : the key (the unaligned variable-length array of bytes)
2087	    len     : the length of the key, counting by bytes
2088	    initval : can be any 4-byte value
2089	    Returns a 32-bit value.  Every bit of the key affects every bit
2090	    of the return value.  Every 1-bit and 2-bit delta achieves
2091	    avalanche. About 6*len+35 instructions.

2093	    The best hash table sizes are powers of 2.  There is no need to
2094	    do mod a prime (mod is sooo slow!).  If you need less than 32
2095	    bits, use a bitmask.  For example, if you need only 10 bits, do
2096	    h = (h & hashmask(10));
2097	    In which case, the hash table should have hashsize(10) elements.

2099	    If you are hashing n strings (ub1 **)k, do it like this:
2100	    for (i=0, h=0; i<n; ++i) h = hash( k[i], len[i], h);

2102	    By Bob Jenkins, 1996.  bob_jenkins@burtleburtle.net.  You may
2103	    use this code any way you wish, private, educational, or
2104	    commercial.  It's free. See
2105	    http://burtleburtle.net/bob/hash/evahash.html
2106	    Use for hash table lookup, or anything where one collision in
2107	    2^^32 is acceptable.  Do NOT use for cryptographic purposes.
2108	     ----------------------------------------------------------- */

2110	      ub4 bob_hash(k, length, initval)
2111	      register ub1 *k;        /* the key */
2112	      register ub4  length;   /* the length of the key */
2113	      register ub4  initval;  /* an arbitrary value */
2114	      {
2115	         register ub4 a,b,c,len;

2117	         /* Set up the internal state */
2118	         len = length;
2119	         a = b = 0x9e3779b9; /*the golden ratio; an arbitrary value
2120	    */
2121	         c = initval;         /* another arbitrary value */

2123	    /*------------------------------------ handle most of the key */

2125	         while (len >= 12)
2126	         {
2127	            a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16)
2128	    +((ub4)k[3]<<24));
2129	            b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16)
2130	    +((ub4)k[7]<<24));
2131	            c += (k[8] +((ub4)k[9]<<8)
2132	    +((ub4)k[10]<<16)+((ub4)k[11]<<24));
2133	            mix(a,b,c);
2134	            k += 12; len -= 12;
2135	         }

2137	         /*---------------------------- handle the last 11 bytes */
2138	         c += length;
2139	         switch(len)       /* all the case statements fall through*/
2140	         {
2141	         case 11: c+=((ub4)k[10]<<24);
2142	         case 10: c+=((ub4)k[9]<<16);
2143	         case 9 : c+=((ub4)k[8]<<8);
2144	            /* the first byte of c is reserved for the length */
2145	         case 8 : b+=((ub4)k[7]<<24);
2146	         case 7 : b+=((ub4)k[6]<<16);
2147	         case 6 : b+=((ub4)k[5]<<8);
2148	         case 5 : b+=k[4];
2149	         case 4 : a+=((ub4)k[3]<<24);
2150	         case 3 : a+=((ub4)k[2]<<16);
2151	         case 2 : a+=((ub4)k[1]<<8);
2152	         case 1 : a+=k[0];
2153	           /* case 0: nothing left to add */
2154	         }
2155	         mix(a,b,c);
2156	         /*-------------------------------- report the result */
2157	         return c;
2158	      }