idnits 2.17.1 draft-ietf-psamp-sample-tech-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 22. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1927. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1895. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1902. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1908. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 14 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 2038 has weird spacing: '...ong int ub4; ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 9, 2008) is 5768 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'PSAMP-TECH' is mentioned on line 173, but not defined -- Looks like a reference, but probably isn't: '1' on line 2127 == Missing Reference: 'N' is mentioned on line 674, but not defined == Missing Reference: 'RCF2205' is mentioned on line 870, but not defined -- Looks like a reference, but probably isn't: '0' on line 2152 -- Looks like a reference, but probably isn't: '00' on line 1532 == Missing Reference: 'FF' is mentioned on line 1532, but not defined -- Looks like a reference, but probably isn't: '2' on line 2127 -- Looks like a reference, but probably isn't: '4' on line 2148 -- Looks like a reference, but probably isn't: '5' on line 2129 -- Looks like a reference, but probably isn't: '6' on line 2129 -- Looks like a reference, but probably isn't: '8' on line 2131 -- Looks like a reference, but probably isn't: '9' on line 2131 == Unused Reference: 'RFC1624' is defined on line 1794, but no explicit reference was found in the text == Unused Reference: 'RFC2205' is defined on line 1797, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 5102 (Obsoleted by RFC 7012) -- Obsolete informational reference (is this intentional?): RFC 5101 (Obsoleted by RFC 7011) == Outdated reference: A later version (-13) exists of draft-ietf-psamp-framework-11 == Outdated reference: A later version (-11) exists of draft-ietf-psamp-info-06 == Outdated reference: A later version (-09) exists of draft-ietf-psamp-protocol-07 -- Duplicate reference: RFC1624, mentioned in 'RFC1624', was also mentioned in 'RFC1141'. Summary: 1 error (**), 0 flaws (~~), 12 warnings (==), 20 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft T. Zseby 3 Document: Fraunhofer FOKUS 4 Intended status: Proposed Standard M. Molina 5 Expires: December 2008 DANTE 6 N. Duffield 7 AT&T Labs-Research 8 S. Niccolini 9 NEC Europe Ltd. 10 F. Raspall 11 EPSC-UPC 12 July 9, 2008 14 Sampling and Filtering Techniques for IP Packet Selection 16 Status of this Memo 18 By submitting this Internet-Draft, each author represents that 19 any applicable patent or other IPR claims of which he or she is 20 aware have been or will be disclosed, and any of which he or she 21 becomes aware will be disclosed, in accordance with Section 6 of 22 BCP 79. 24 Internet-Drafts are working documents of the Internet 25 Engineering Task Force (IETF), its areas, and its working 26 groups. Note that other groups may also distribute working 27 documents as Internet-Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six 30 months and may be updated, replaced, or obsoleted by other 31 documents at any time. It is inappropriate to use Internet- 32 Drafts as reference material or to cite them other than as "work 33 in progress." 35 The list of current Internet-Drafts can be accessed at 36 http://www.ietf.org/ietf/1id-abstracts.txt. 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html. 41 This Internet-Draft will expire on December, 2008. 43 Copyright Notice 45 Copyright (C) The IETF Trust (2008). 47 Abstract 49 This document describes Sampling and Filtering techniques for IP 50 packet selection. It provides a categorization of schemes and 51 defines what parameters are needed to describe the most common 52 selection schemes. Furthermore it shows how techniques can be 53 combined to build more elaborate packet Selectors. The document 54 provides the basis for the definition of information models for 55 configuring selection techniques in Metering Processes and for 56 reporting the technique in use to a Collector. 58 Conventions used in this document 60 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 61 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 62 "OPTIONAL" in this document are to be interpreted as described 63 in RFC 2119 [RFC2119]. 65 Table of Contents 67 1. Introduction................................................. 4 68 2. PSAMP Documents Overview..................................... 5 69 3. Terminology.................................................. 5 70 3.1 Observation Points, Packet Streams and Packet Content..... 5 71 3.2 Selection Process......................................... 6 72 3.3 Reporting................................................. 8 73 3.4 Metering Process ......................................... 8 74 3.5 Exporting Process......................................... 8 75 3.6 PSAMP Device.............................................. 9 76 3.7 Collector................................................. 9 77 3.8 Selection Methods......................................... 9 78 4. Categorization of Packet Selection Techniques............... 12 79 5. Sampling.................................................... 14 80 5.1 Systematic Sampling...................................... 14 81 5.2 Random Sampling.......................................... 15 82 5.2.1 n-out-of-N Sampling...................................... 15 83 5.2.2 Probabilistic Sampling................................... 16 84 5.2.2.1 Uniform Probabilistic Sampling........................... 16 85 5.2.2.2 Non-Uniform Probabilistic Sampling....................... 16 86 5.2.2.3 Non-Uniform Flow State Dependent Sampling................ 16 87 5.2.2.4 Configuration of non-uniform probabilistic and flow- 88 state Sampling.......................................... 17 89 6. Filtering................................................... 17 90 6.1 Property Match Filtering................................. 18 91 6.2 Hash-based Filtering..................................... 20 92 6.2.1 Application Examples for Coordinated Packet Selection ... 21 93 6.2.1.1 Trajectory Sampling...................................... 21 94 6.2.1.2 Passive One-way Measurements............................. 21 95 6.2.1.3 Generation of Pseudo-random Numbers...................... 22 96 6.2.2 Desired Properties of Hash Functions..................... 22 97 6.2.2.1 Requirements for Packet Selection........................ 23 98 6.2.2.2 Requirements for Packet Digesting........................ 23 99 6.2.3 Security Considerations for Hash Functions............... 24 100 6.2.3.1 Vulnerabilities of Hash-based selection without 101 knowledge of selection outcomes......................... 25 102 6.2.3.2 Vulnerabilities of Hash-based selection using knowledge 103 of selection outcomes................................... 26 104 6.2.3.3 Vulnerabilities to Replay Attacks........................ 27 105 6.2.4 Choice of Hash-Function.................................. 27 106 6.2.4.1 Hash Functions for Packet Selection...................... 28 107 6.2.4.2 Hash Functions Suitable for Packet Digesting............. 30 108 7. Parameters for the Description of Selection Techniques...... 30 109 7.1 Description of Sampling Techniques....................... 31 110 7.2 Description of Filtering Techniques...................... 32 111 8. Composite Techniques........................................ 34 112 8.1 Cascaded Filtering->Sampling or Sampling->Filtering...... 35 113 8.2 Stratified Sampling...................................... 35 114 9. Security Considerations..................................... 36 115 10. Acknowledgements............................................ 37 116 11. IANA Considerations......................................... 37 117 12. Normative References........................................ 37 118 13. Informative References...................................... 37 119 14. Authors' Addresses.......................................... 40 120 15. Contributors................................................ 41 121 16. Intellectual Property Statement............................. 41 122 17. Copyright Statement......................................... 42 123 18. Disclaimer.................................................. 42 124 Appendix A: Hash Functions....................................... 42 125 A.1 IP Shift-XOR (IPSX) Hash Function............................ 42 126 A.2 BOB Hash Function............................................ 43 128 1. Introduction 130 There are two main drivers for the growth in measurement 131 infrastructures and their underlying technology. First, network 132 data rates are increasing, with a concomitant growth in 133 measurement data. Secondly, the growth is compounded by the 134 demand of measurement-based applications for increasingly fine 135 grained traffic measurements. Devices such as routers, which 136 perform the measurements, require increasingly sophisticated and 137 resource intensive measurement capabilities, including the 138 capture of packet headers or even parts of the payload, and 139 classification for flow analysis. All these factors can lead to 140 an overwhelming amount of measurement data, resulting in high 141 demands on resources for measurement, storage, transfer and post 142 processing. 144 The sustained capture of network traffic at line rate can be 145 performed by specialized measurement hardware. However, the cost 146 of the hardware and the measurement infrastructure required to 147 accommodate the measurements preclude this as a ubiquitous 148 approach. Instead some form of data reduction at the point of 149 measurement is necessary. 150 This can be achieved by an intelligent packet selection through 151 Sampling or Filtering. Another way to reduce the amount of data 152 is to use aggregation techniques (not addressed in this 153 document). The motivation for Sampling is to select a 154 representative subset of packets that allow accurate estimates 155 of properties of the unsampled traffic to be formed. The 156 motivation for Filtering is to remove all packets that are not 157 of interest. Aggregation combines data and allows compact pre- 158 defined views of the traffic. Examples of applications that 159 benefit from packet selection are given in [PSAMP-FW]. 160 Aggregation techniques are out of scope of this document. 162 2. PSAMP Documents Overview 164 This document is one out of a series of documents from the PSAMP 165 group. 167 [PSAMP-FW]: "A Framework for Packet Selection and Reporting" 168 describes the PSAMP framework for network elements 169 to select subsets of packets by statistical and 170 other methods, and to export a stream of reports 171 on the selected packets to a Collector. 173 [PSAMP-TECH]: "Sampling and Filtering Techniques for IP Packet 174 Selection" (this document) describes the set of 175 packet selection techniques supported by PSAMP. 177 [PSAMP-PROTO]: "Packet Sampling (PSAMP) Protocol Specifications" 178 specifies the export of packet information from a 179 PSAMP Exporting Process to a PSAMP Collecting 180 Process. 182 [PSAMP-INFO]: "Information Model for Packet Sampling Exports" 183 defines an information and data model for PSAMP. 185 3. Terminology 187 The PSAMP terminology defined here is fully consistent with all 188 terms listed in [PSAMP-FW] but includes additional terms 189 required for the description of packet selection methods. An 190 architecture overview and possible configurations of PSAMP 191 elements can be found in [PSAMP-FW]. PSAMP terminology also aims 192 at consistency with terms used in [RFC3917]. The relationship 193 between PSAMP and IPFIX terms is described in [PSAMP-FW]. 195 In the PSAMP documents all defined PSAMP terms are written 196 capitalized. This document uses the same convention. 198 3.1 Observation Points, Packet Streams and Packet Content 200 * Observation Point 202 An Observation Point is a location in the network where 203 packets can be observed. Examples include: 205 (i) A line to which a probe is attached; 206 (ii) a shared medium, such as an Ethernet-based LAN; 208 (iii) a single port of a router, or set of interfaces 209 (physical or logical) of a router; 211 (iv) an embedded measurement subsystem within an interface. 213 Note that one Observation Point may be a superset of several 214 other Observation Points. For example one Observation Point 215 can be an entire line card. This would be the superset of the 216 individual Observation Points at the line card's interfaces. 218 * Observed Packet Stream 220 The Observed Packet Stream is the set of all packets observed 221 at the Observation Point. 223 * Packet Stream 225 A packet stream denotes a set of packets that flows past some 226 specified point within the metering process. An example of a 227 Packet Stream is the output of the selection process. 228 Note that packets selected from a stream, e.g. by Sampling, do 229 not necessarily possess a property by which they can be 230 distinguished from packets that have not been selected. For 231 this reason the term "stream" is favored over "flow", which is 232 defined as set of packets with common properties [RFC3917]. 234 * Packet Content 236 The packet content denotes the union of the packet header 237 (which includes link layer, network layer and other 238 encapsulation headers) and the packet payload. At some 239 Observation Points the link header information may not be 240 available. 242 3.2 Selection Process 244 * Selection Process 246 A Selection Process takes the Observed Packet Stream as its 247 input and selects a subset of that stream as its output. 249 * Selection State 251 A Selection Process may maintain state information for use by 252 the Selection Process. At a given time, the Selection State 253 may depend on packets observed at and before that time, and 254 other variables. Examples include: 256 (i) sequence numbers of packets at the input of Selectors; 258 (ii) a timestamp of observation of the packet at the 259 Observation Point; 261 (iii) iterators for pseudo-random number generators; 263 (iv) hash values calculated during selection; 265 (v) indicators of whether the packet was selected by a 266 given Selector; 268 Selection Processes may change portions of the Selection State 269 as a result of processing a packet. Selection state for a 270 packet is to reflect the state after processing the packet. 272 * Selector 274 A Selector defines the action of a Selection Process on a 275 single packet of its input. If selected, the packet becomes an 276 element of the output Packet Stream. 278 The Selector can make use of the following information in 279 determining whether a packet is selected: 281 (i) the packet's content; 283 (ii) information derived from the packet's treatment at the 284 Observation Point; 286 (iii) any selection state that may be maintained by the 287 Selection Process. 289 * Composite Selector 291 A Composite Selector is an ordered composition of Selectors, 292 in which the output Packet Stream issuing from one Selector 293 forms the input Packet Stream to the succeeding Selector. 295 * Primitive Selector 297 A Selector is primitive if it is not a Composite Selector. 299 * Selection Sequence 300 From all the packets observed at an Observation Point, only a 301 few packets are selected by one or more Selectors. The 302 Selection Sequence is a unique value per Observation Domain 303 describing the Observation Point and the Selector IDs through 304 which the packets are selected. 306 3.3 Reporting 308 * Packet Reports 310 Packet Reports comprise a configurable subset of a packet's 311 input to the Selection Process, including the packet's 312 content, information relating to its treatment (for example, 313 the output interface), and its associated selection state (for 314 example, a hash of the packet's content) 316 * Report Interpretation: 318 Report Interpretation comprises subsidiary information, 319 relating to one or more packets, that is used for 320 interpretation of their packet reports. Examples include 321 configuration parameters of the Selection Process. 323 * Report Stream: 325 The Report Stream is the output of a Metering Process, 326 comprising two distinguished types of information: Packet 327 Reports, and Report Interpretation. 329 3.4 Metering Process 331 A Metering Process selects packets from the Observed Packet 332 Stream using a Selection Process, and produces as output a 333 Report Stream concerning the selected packets. The PSAMP 334 Metering Process can be viewed as analogous to the IPFIX 335 metering process [RFC5101], which produces flow records as its 336 output. While the Metering Process definition in this 337 document specifies the PSAMP definition, the PSAMP protocol 338 specifications [PSAMP-PROTO] will use the IPFIX Metering 339 Process definition, which also suits the PSAMP requirements. 340 The relationship between PSAMP and IPFIX is described more in 341 [PSAMP-INFO] and [PSAMP-PROTO]. 343 3.5 Exporting Process 345 * Exporting Process: 347 An Exporting Process sends, in the form of Export Packet, the 348 output of one or more Metering Processes to one or more 349 Collectors. 351 * Export Packet: 353 An Export Packet is a combination of Report Interpretation 354 and/or one or more Packet Reports are bundled by the Exporting 355 Process into an Export Packet for exporting to a Collector. 357 3.6 PSAMP Device 359 * PSAMP Device 361 A PSAMP Device is a device hosting at least an Observation 362 Point, a Metering Process (which includes a Selection Process) 363 and an Exporting Process. Typically, corresponding 364 Observation Point(s), Metering Process(es) and Exporting 365 Process(es) are co-located at this device, for example at a 366 router. 368 3.7 Collector 370 * Collector 372 A Collector receives a Report Stream exported by one or more 373 Exporting Processes. In some cases, the host of the Metering 374 and/or Exporting Processes may also serve as the Collector. 376 3.8 Selection Methods 378 * Filtering 379 A filter is a Selector that selects a packet deterministically 380 based on the Packet Content, or its treatment, or functions of 381 these occurring in the Selection State. Two examples are: 383 (i) Property match filtering: a packet is selected if a 384 specific field in the packet equals a predefined 385 value. 387 (ii) Hash-based selection: a hash function is applied to 388 the Packet Content, and the packet is selected if the 389 result falls in a specified range. 391 * Sampling 393 A selector that is not a filter is called a sampling 394 operation. This reflects the intuitive notion that if the 395 selection of a packet cannot be determined from its content 396 alone, there must be some type of sampling taking place. 397 Sampling operations can be divided into two subtypes: 399 (i) Content-independent sampling, which does not use 400 Packet Content in reaching sampling decisions. 401 Examples include systematic sampling, and uniform 402 pseudo-random sampling driven by a pseudo-random 403 number whose generation is independent of Packet 404 Content. Note that in Content-independent Sampling it 405 is not necessary to access the Packet Content in order 406 to make the selection decision. 408 (ii) Content-dependent sampling, in which the Packet 409 Content is used in reaching selection decisions. An 410 application is pseudo-random selection according to a 411 probability that depends on the contents of a packet 412 field, e.g., sampling packets with a probability 413 dependent on their TCP/UDP port numbers. Note that 414 this is not a Filter. 416 * Hash Domain 418 A subset of the Packet Content and the packet treatment, 419 viewed as an N-bit string for some positive integer N. 421 * Hash Range 423 A set of M-bit strings for some positive integer M that define 424 the range of values the result of the hash operation can take. 426 * Hash Function 428 A Hash Function defines a deterministic mapping from the Hash 429 Domain into the Hash Range. 431 * Hash Selection Range 433 The Hash Selection Range is a subset of the Hash Range. The 434 packet is selected if the action of the Hash Function on the 435 Hash Domain for the packet yields a result in the Hash 436 Selection Range. 438 * Hash-based Selection 440 Hash-based Selection is a Filtering specified by a Hash 441 Domain, a Hash Function, and Hash Range and a Hash Selection 442 Range. 444 * Approximative Selection 446 Selectors in any of the above categories may be approximated 447 by operations in the same or another category for the purposes 448 of implementation. For example, uniform pseudo-random Sampling 449 may be approximated by Hash-based Selection, using a suitable 450 Hash Function and Hash Domain. In this case, the closeness of 451 the approximation depends on the choice of Hash Function and 452 Hash Domain. 454 * Population 456 A Population is a Packet Stream, or a subset of a Packet 457 Stream. A Population can be considered as a base set from 458 which packets are selected. An example is all packets in the 459 Observed Packet Stream that are observed within some specified 460 time interval. 462 * Population Size 464 The Population Size is the number of all packets in the 465 Population. 467 * Sample Size 469 The number of packets selected from the Population by a 470 Selector. 472 * Configured Selection Fraction 474 The Configured Selection Fraction is the ratio of the number 475 of packets selected by a Selector from an input Population, to 476 the Population Size, as based on the configured selection 477 parameters. 479 * Attained Selection Fraction 481 The Attained Selection Fraction is the actual ratio of the 482 number of packets selected by a Selector from an input 483 Population, to the Population Size. 485 For some sampling methods the Attained Selection Fraction can 486 differ from the Configured Selection Fraction due to, for 487 example, the inherent statistical variability in sampling 488 decisions of probabilistic Sampling and Hash-based Selection. 489 Nevertheless, for large Population Sizes and properly configured 490 Selectors, the Attained Selection Fraction usually approaches 491 the Configured Selection Fraction. 493 4. Categorization of Packet Selection Techniques 495 Packet selection techniques generate a subset of packets from an 496 Observed Packet Stream at an Observation Point. We distinguish 497 between Sampling and Filtering. 499 Sampling is targeted at the selection of a representative subset 500 of packets. The subset is used to infer knowledge about the 501 whole set of observed packets without processing them all. The 502 selection can depend on packet position, and/or on packet 503 content, and/or on (pseudo) random decisions. 505 Filtering selects a subset with common properties. This is used 506 if only a subset of packets is of interest. The properties can 507 be directly derived from the packet content, or depend on the 508 treatment given by the router to the packet. Filtering is a 509 deterministic operation. It depends on packet content or router 510 treatment. It never depends on packet position or on (pseudo) 511 random decisions. 513 Note that a common technique to select packets is to compute a 514 Hash Function on some bits of the packet header and/or content 515 and to select it if the Hash Value falls in the Hash Selection 516 Range. Since hashing is a deterministic operation on the packet 517 content, it is a Filtering technique according to our 518 categorization. Nevertheless, Hash Functions are sometimes used 519 to emulate random Sampling. Depending on the chosen input bits, 520 the Hash Function and the Hash Selection Range, this technique 521 can be used to emulate the random selection of packets with a 522 given probability p. It is also a powerful technique to 523 consistently select the same packet subset at multiple 524 Observation Points [DuGr00] 526 The following table gives an overview of the schemes described 527 in this document and their categorization. An X in brackets (X) 528 denotes schemes for which also content-independent variants 529 exist. It easily can be seen that only schemes with both 530 properties, content dependence and deterministic selection, are 531 considered as filters. 533 Selection Scheme | Deterministic | Content- | Category 534 | Selection | dependent| 535 ------------------------+---------------+----------+---------- 536 Systematic | X | _ | Sampling 537 Count-based | | | 538 ------------------------+---------------+----------+---------- 539 Systematic | X | - | Sampling 540 Time-based | | | 541 ------------------------+---------------+----------+---------- 542 Random | - | - | Sampling 543 n-out-of-N | | | 544 ------------------------+---------------+----------+---------- 545 Random | - | - | Sampling 546 Uniform probabilistic | | | 547 ------------------------+---------------+----------+---------- 548 Random | - | (X) | Sampling 549 Non-uniform probabil. | | | 550 ------------------------+---------------+----------+---------- 551 Random | - | (X) | Sampling 552 Non-uniform flow-state | | | 553 ------------------------+---------------+----------+---------- 554 Property Match | X | (X) | Filtering 555 Filtering | | | 556 ------------------------+---------------+----------+---------- 557 Hash Function | X | X | Filtering 558 ------------------------+---------------+----------+---------- 560 In the table x means that the characteristic applies to the 561 selection scheme and (x) means that the characteristic only 562 partly applies to the selection scheme. For instance property 563 match filtering is typically based on packet content and 564 therefore content dependent. But as explained in section 6.1 it 565 may also depend on router state and then would be independent of 566 the content. 568 The categorization just introduced is mainly useful for the 569 definition of an information model describing Primitive 570 Selectors. More complex selection techniques can be described 571 through the composition of cascaded Sampling and Filtering 572 operations. For example, a packet selection that weights the 573 selection probability on the basis of the packet length can be 574 described as a cascade of a Filtering and a Sampling scheme. 575 However, this descriptive approach is not intended to be rigid: 576 if a common and consolidated selection practice turns out to be 577 too complex to be described as a composition of the mentioned 578 building blocks, an ad hoc description can be specified instead 579 and added as a new scheme to the information model. 581 5. Sampling 583 The deployment of Sampling techniques aims at the provisioning 584 of information about a specific characteristic of the parent 585 population at a lower cost than a full census would demand. In 586 order to plan a suitable Sampling strategy it is therefore 587 crucial to determine the needed type of information and the 588 desired degree of accuracy in advance. 590 First of all, it is important to know the type of metric that 591 should be estimated. The metric of interest can range from 592 simple packet counts [JePP92] up to the estimation of whole 593 distributions of flow characteristics (e.g. packet 594 sizes)[ClPB93]. 596 Secondly, the required accuracy of the information and with 597 this, the confidence that is aimed at, should be known in 598 advance. For instance for usage-based accounting the required 599 confidence for the estimation of packet counters can depend on 600 the monetary value that corresponds to the transfer of one 601 packet. That means that a higher confidence could be required 602 for expensive packet flows (e.g. premium IP service) than for 603 cheaper flows (e.g. best effort). The accuracy requirements for 604 validating a previously agreed quality can also vary extremely 605 with the customer demands. These requirements are usually 606 determined by the service level agreement (SLA). 608 The Sampling method and the parameters in use must be clearly 609 communicated to all applications that use the measurement data. 610 Only with this knowledge a correct interpretation of the 611 measurement results can be ensured. 613 Sampling methods can be characterized by the Sampling algorithm, 614 the trigger type used for starting a Sampling interval and the 615 length of the Sampling interval. These parameters are described 616 here in detail. The Sampling algorithm describes the basic 617 process for selection of samples. In accordance to [AmCa89] and 618 [ClPB93] we define the following basic Sampling processes: 620 5.1 Systematic Sampling 622 Systematic Sampling describes the process of selecting the start 623 points and the duration of the selection intervals according to 624 a deterministic function. This can be for instance the periodic 625 selection of every k-th element of a trace but also the 626 selection of all packets that arrive at pre-defined points in 627 time. Even if the selection process does not follow a periodic 628 function (e.g. if the time between the Sampling intervals varies 629 over time) we consider this as systematic Sampling as long as 630 the selection is deterministic. 632 The use of systematic Sampling always involves the risk of 633 biasing the results. If the systematics in the Sampling process 634 resemble systematics in the observed stochastic process 635 (occurrence of the characteristic of interest in the network), 636 there is a high probability that the estimation will be biased. 637 Systematics in the observed process might not be known in 638 advance. 640 Here only equally spaced schemes are considered, where triggers 641 for Sampling are periodic, either in time or in packet count. 642 All packets occurring in a selection interval (either in time or 643 packet count) beyond the trigger are selected. 645 Systematic count-based 646 In systematic count-based Sampling the start and stop triggers 647 for the Sampling interval are defined in accordance to the 648 spatial packet position (packet count). 650 Systematic time-based 651 In systematic time-based Sampling time-based start and stop 652 triggers are used to define the Sampling intervals. All packets 653 are selected that arrive at the Observation Point within the 654 time-intervals defined by the start and stop triggers (i.e. 655 arrival time of the packet is larger than the start time and 656 smaller than the stop time). 658 Both schemes are content-independent selection schemes. Content 659 dependent deterministic Selectors are categorized as filter. 661 5.2 Random Sampling 663 Random Sampling selects the starting points of the Sampling 664 intervals in accordance to a random process. The selection of 665 elements are independent experiments. With this, unbiased 666 estimations can be achieved. In contrast to systematic Sampling, 667 random Sampling requires the generation of random numbers. One 668 can differentiate two methods of random Sampling: 670 5.2.1 n-out-of-N Sampling 672 In n-out-of-N Sampling n elements are selected out of the parent 673 population that consists of N elements. One example would be to 674 generate n different random numbers in the range [1,N] and 675 select all packets which have a packet position equal to one of 676 the random numbers. For this kind of Sampling the Sample Size n 677 is fixed. 679 5.2.2 Probabilistic Sampling 681 In probabilistic Sampling the decision whether an element is 682 selected or not is made in accordance to a pre-defined selection 683 probability. An example would be to flip a coin for each packet 684 and select all packets for which the coin showed the head. For 685 this kind of Sampling the Sample Size can vary for different 686 trials. The selection probability does not necessarily has to be 687 the same for each packet. Therefore we distinguish between 688 uniform probabilistic Sampling (with the same selection 689 probability for all packets) and non-uniform probabilistic 690 Sampling (where the selection probability can vary for different 691 packets). 693 5.2.2.1 Uniform Probabilistic Sampling 695 For Uniform Probabilistic Sampling packets are selected 696 independently with a uniform probability p. This Sampling can be 697 count-driven, and is sometimes referred to as geometric random 698 Sampling, since the difference in count between successive 699 selected packets are independent random variables with a 700 geometric distribution of mean 1/p. A time-driven analog, 701 exponential random Sampling, has the time between triggers 702 exponentially distributed. 703 Both geometric and exponential random Sampling are examples of 704 what is known as additive random Sampling, defined as Sampling 705 where the intervals or counts between successive samples are 706 independent identically distributed random variable. 708 5.2.2.2 Non-Uniform Probabilistic Sampling 710 This is a variant of Probabilistic Sampling in which the 711 Sampling probabilities can depend on the selection process 712 input. This can be used to weight Sampling probabilities in 713 order e.g. to boost the chance of Sampling packets that are rare 714 but are deemed important. Unbiased estimators for quantitative 715 statistics are recovered by re-normalization of sample values; 716 see [HT52]. 718 5.2.2.3 Non-Uniform Flow State Dependent Sampling 720 Another type of Sampling that can be classified as probabilistic 721 Non-Uniform is closely related to the flow concept as defined in 722 [RFC3917], and it is only used jointly with a flow monitoring 723 function (IPFIX metering process). Packets are selected, 724 dependent on a selection state. The point, here, is that the 725 selection state is determined also by the state of the flow the 726 packet belongs to and/or by the state of the other flows 727 currently being monitored by the associated flow monitoring 728 function. An example for such an algorithm is the "sample and 729 hold" method described in [EsVa01]: 731 - If a packet accounts for a flow record that already exists in 732 the IPFIX flow recording process, it is selected (i.e. the 733 flow record is updated) 734 - If a packet doesn't account to any existing flow record, it is 735 selected with probability p. If it has been selected a new 736 flow record has to be created. 738 A further algorithm that fits into the category of non-uniform 739 flow state dependent Sampling is described in [Moli03]. 741 This type of Sampling is content dependent because the 742 identification of the flow the packet belongs to requires 743 analyzing part of the packet content. If the packet is selected, 744 then it is passed as an input to the IPFIX monitoring function 745 (this is called "Local Export" in [PSAMP-FW]. Selecting the 746 packet depending on the state of a flow cache is useful when 747 memory resources of the flow monitoring function are scarce 748 (i.e. there is no room to keep all the flows that have been 749 scheduled for monitoring). 751 5.2.2.4 Configuration of non-uniform probabilistic and flow-state 752 Sampling 754 Many different specific methods can be grouped under the terms 755 non-uniform probabilistic and flow state Sampling. Dependent on 756 the Sampling goal and the implemented scheme, a different number 757 and type of input parameters is required to configure such 758 scheme. 760 Some concrete proposals for such methods exist from the research 761 community (e.g. [EsVa01],[DuLT01],[Moli03]). Some of these 762 proposals are still in an early stage and need further 763 investigations to prove their usefulness and applicability. It 764 is not our aim to indicate preference amongst these methods. 765 Instead, we only describe here the basic methods and leave the 766 specification of explicit schemes and their parameters up to 767 vendors (e.g. as extension of the information model). 769 6. Filtering 770 Filtering is the deterministic selection of packets based on the 771 packet content, the treatment of the packet at the Observation 772 Point, or deterministic functions of these occurring in the 773 selection state. The packet is selected if these quantities fall 774 into a specified range. The role of Filtering, as the word 775 itself suggest, is to separate all the packets having a certain 776 property from those not having it. A distinguishing 777 characteristic from Sampling is that the selection decision does 778 not depend on the packet position in time or in the space, or on 779 a random process. 780 We identify and describe in the following two Filtering 781 techniques. 783 6.1 Property Match Filtering 785 With this Filtering method a packet is selected if specific 786 fields within the packet and/or properties of the router state 787 equal a predefined value. Possible filter fields are all IPFIX 788 flow attributes specified in [RFC5102]. Further fields can be 789 defined by proposing new information elements or defining vendor 790 specific extensions. 792 A packet is selected if Field=Value. Masks and ranges are only 793 supported to the extent to which [RFC5102] allows them e.g. by 794 providing explicit fields like the netmasks for source and 795 destination addresses. 797 AND operations are possible by concatenating filters, thus 798 producing a composite selection operation. In this case, the 799 ordering in which the filtering happens is implicitly defined 800 (outer filters come after inner filters). However, as long as 801 the concatenation is on filters only, the result of the cascaded 802 filter is independent from the order, but the order may be 803 important for implementation purposes, as the first filter will 804 have to work at a higher rate. In any case, an implementation 805 is not constrained to respect the filter ordering, as long as 806 the result is the same, and it may even implement the composite 807 filtering in filtering in one single step. 809 OR operations are not supported with this basic model. More 810 sophisticated filters (e.g. supporting bitmasks, ranges or OR 811 operations etc.) can be realized as vendor specific schemes. 813 All IPFIX flow attributes defined in [RFC5102] can be used for 814 property match filtering. Further information elements can be 815 easily defined. Typical header fields that should be supported 816 for property match operations are the following: 818 (i) the IP header (excluding options in IPv4, stacked 819 headers in IPv6) 821 (ii) transport protocol header (e.g. TCP, UDP) 823 (iii) encapsulation headers (e.g. the MPLS label stack, if 824 present) 826 When the PSAMP Device offers property match filtering, and, in 827 its usual capacity other than in performing PSAMP functions, 828 identifies or processes information from IP, transport protocol 829 or encapsulation protocols, then the information should be made 830 available for filtering. For example, when a PSAMP Device 831 routes based on destination IP address, that field should be 832 made available for filtering. Conversely, a PSAMP Device that 833 does not route is not expected to be able to locate an IP 834 address within a packet, or make it available for Filtering, 835 although it may do so. 837 Since packet encryption conceals the real values of encrypted 838 fields, property match filtering must be configurable to ignore 839 encrypted packets, when detected. 841 The Selection Process may support filtering based on the 842 properties of the router state: 844 (i) Ingress interface at which packet arrives equals a 845 specified value 847 (ii) Egress interface to which packet is routed to equals a 848 specified value 850 (iii) Packet violated Access Control List (ACL) on the 851 router 853 (iv) Failed Reverse Path Forwarding (RPF) 855 (v) Failed Resource Reservation (RSVP) 857 (vi) No route found for the packet 859 (vii) Origin Border Gateway Protocol (BGP) Autonomous System 860 (AS) [RFC4271] equals a specified value or lies within 861 a given range 862 (viii)Destination BGP AS equals a specified value or lies 863 within a given range 865 Packets that match the Failed Reverse Path Forwarding (RPF) 866 condition are packets for which ingress filtering failed as 867 defined in [RFC3704]. 868 Packets that match the Failed Resource Reservation condition are 869 packets that do not fulfill the RSVP specification as defined in 870 [RCF2205]. 872 Router architectural considerations may preclude some 873 information concerning the packet treatment being available at 874 line rate for selection of packets. For example, the Selection 875 Process may not be implemented in the fast path that is able to 876 access routing state at line rate. However, when filtering 877 follows sampling (or some other selection operation) in a 878 Composite Selector, the rate of the Packet Stream output from 879 the sampler and input to the filter may be sufficiently slow 880 that the filter could select based on routing state. 882 6.2 Hash-based Filtering 884 A Hash Function h maps the Packet Content c, or some portion of 885 it, onto a Hash Range R. The packet is selected if h(c) is an 886 element of S, which is a subset of R called the Hash Selection 887 Range. Thus Hash-based Selection is a particular case of 888 Filtering. The object is selected if c is in inv(h(S)). But for 889 desirable Hash Functions the inverse image inv(h(S)) will be 890 extremely complex, and hence h would not be expressible as, say, 891 a Property Match Filter or a simple combination of these. 893 Hash-based selection is mainly used to realize a coordinated 894 packet selection. That means that the same packets are selected 895 at different Observation Points. This is useful for instance to 896 observe the path (trajectory) that a packet took through the 897 network or to apply packet selection to passive one-way 898 measurements. 900 A pre-requisite for the method to work and to ensure 901 interoperability is that the same Hash Function with the same 902 parameters (e.g. input vector) is used at the observation 903 points. 905 A consistent packet selection is also possible with property 906 match filtering. Nevertheless, hash-based selection can be used 907 to approximate a random selection. The desired statistical 908 properties are discussed in section 6.2.2. 910 In the following subsections we give some application examples 911 for coordinated packet selection. 913 6.2.1 Application Examples for Coordinated Packet Selection 915 6.2.1.1 Trajectory Sampling 917 Trajectory Sampling is the consistent selection of a subset of 918 packets at either all of a set of Observation Points or none of 919 them. Trajectory Sampling is realized by Hash-based Selection if 920 all Observation Points in the set use a common Hash Function, 921 Hash Domain and selection range. The Hash Domain comprises all 922 or part of the packet content that is invariant along the packet 923 path. Fields such as Time-to-Live, which is decremented per hop, 924 and header CRC, which is recalculated per hop, are thus excluded 925 from the Hash Domain. The Hash Domain needs to be wider than 926 just a flow key, if packets are to be selected quasi-randomly 927 within flows. 929 The trajectory (or path) followed by a packet is reconstructed 930 from PSAMP reports on it that reach a Collector. Reports on a 931 given packet originating from different observations points are 932 associated by matching a label from the reports. The label may 933 comprise that portion invariant packet content that is reported, 934 or possibly some digest of the invariant packet content that is 935 inserted into the packet report at the Observation Point. Such a 936 digest may be constructed by applying a second Hash Function 937 (distinct from that used for selection) to the invariant packet 938 content. The reconstruction of trajectories, and methods for 939 dealing with possible ambiguities due to label collisions 940 (identical labels reported for different packets) and potential 941 loss of reports in transmission, are dealt with in [DuGr00], 942 [DuGG02] and [DuGr04]. 944 Applications of trajectory Sampling include (i) estimation of 945 the network path matrix, i.e., the traffic intensities according 946 to network path, broken down by flow key; (ii) detection of 947 routing loops, as indicated by self-intersecting trajectories; 948 (iii) passive performance measurement: prematurely terminating 949 trajectories indicate packet loss, packet one way delay can be 950 determined if reports include (synchronized) timestamps of 951 packet arrival at the Observation Point; (iv) network attack 952 tracing, of the actual paths taken by attack packets with 953 spoofed source addresses. 955 6.2.1.2 Passive One-way Measurements 957 Coordinated packet selection can be applied for instance to one- 958 way delay measurements in order to reduce the required 959 resources. In one-way delay measurements packets are collected 960 at different Observation Points in the network. A packet digest 961 is generated for each packet that helps to identify the packet. 962 The packet digest and the arrival time of the packet at the 963 observation point are reported to a process that calculates the 964 delay. The delay is calculated by subtracting the arrival time 965 of the same packet at the observation points (e.g. [ZsZC01]). 966 With high data rates, capturing all packets can require a lot of 967 resources for storage, transfer and processing. To reduce 968 resource consumption packet selection methods can be applied. 969 But for such selection techniques it has to be ensured that the 970 same packets are collected at different observation points. 972 6.2.1.3 Generation of Pseudo-random Numbers 974 Although pseudo-random number generators with well understood 975 properties have been developed, they may not be the method of 976 choice in settings where computational resources are scarce. A 977 convenient alternative is to use Hash Functions of packet 978 content as a source of randomness. The hash (suitably re- 979 normalized) is a pseudo-random variate in the interval [0,1]. 980 Other schemes may use packet fields in iterators for pseudo- 981 random numbers. However, the statistical properties of an ideal 982 packet selection law (such as independent Sampling for different 983 packets, or independence on packet content) may not be exactly 984 rendered by an implementation, but only approximately so. 986 Use of packet content to generate pseudo-random variates shares 987 with Non-uniform Probabilistic Sampling (see Section 3.1.2.2.2 988 above) the property that selection decisions depend on Packet 989 Content. However, there is a fundamental difference between the 990 two. In the former case the content determines pseudo-random 991 variates. In the latter case the content only determines the 992 selection probabilities: selection could then proceed e.g., by 993 use of random variates obtained by an independent pseudo-random 994 number generator. 996 6.2.2 Desired Properties of Hash Functions 998 Here we formulate desired properties for hash functions. For 999 this we have to distinguish whether a hash function is used for 1000 packet selection or just as a packet digest. The main purpose of 1001 this document is on packet selection. Nevertheless, we also 1002 provide some requirements for the use of hash functions as 1003 packet digest. 1005 First of all we need to define suitable input fields from the 1006 packet. In accordance to [DuGr00] input field should be 1007 - invariant on the path 1008 - variable among packets 1010 Only if the input fields are the same at different observation 1011 points it is possible to recognize the packet. The input fields 1012 should be variable among packets in order to distribute the hash 1013 results over the Selection Range. 1015 6.2.2.1 Requirements for Packet Selection 1017 In accordance to considerations in [MoND05] and [Henk08] we 1018 define the following desired properties of hash functions used 1019 for packet selection: 1021 (i) Speed: The hash function has to be applied to each packet 1022 that traverses the observation point. Therefore it has to be 1023 fast in order to cope with the high packet rates. In the ideal 1024 case the hash operation should not influence the performance on 1025 the PSAMP device. 1027 (ii) Uniformity: The Hash Function h should have good mixing 1028 properties, in the sense that small changes in the input (e.g. 1029 the flipping of a single bit) cause large changes in the output 1030 (many bits change). Then any local clump of values of c is 1031 spread widely over R by h, and so the distribution of h(c) is 1032 fairly uniform even if the distribution of c is not. Then the 1033 Sampling Fraction is #S/#R, which can be tuned by choice of S. 1035 (iii) Unbiasedness: The selection decision should be as 1036 independent of packet attributes as possible. The set of 1037 selected packets should not be biased towards a specific type of 1038 packets. 1040 (iv) Representativeness of sample: The sample should be as 1041 representative as possible for the observed traffic. 1043 (v) Non-linearity: The function should not be linear. This 1044 increases the mixing properties (uniformity criterion). In 1045 addition to this it decreases the predictability of the output 1046 and therefore the vulnerabilities against attacks. 1048 (vi) Robustness against vulnerabilities: The hash function 1049 should be robust against attacks. Potential vulnerabilities are 1050 described in section 6.2.3. 1052 6.2.2.2 Requirements for Packet Digesting 1053 For digesting Packet Content for inclusion in a reported label, 1054 the most important property is a low collision frequency. A 1055 secondary requirement is the ability to accept variable length 1056 input, in order to allow inclusion of maximal amount of packet 1057 as input. Execution speed is of secondary importance, since the 1058 digest need only be formed from selected packets. 1060 6.2.3 Security Considerations for Hash Functions 1062 A concern for Hash-based Selection is whether some large set of 1063 related packets could be disproportionately sampled, i.e., that 1064 the Attained Sampling Fraction is significantly different from 1065 the Configured Sampling Fraction. This can happen either 1067 (i) through unanticipated behavior in the Hash Function, or 1069 (ii) because the packets had been deliberately crafted to have 1070 this property. 1072 The first point underlines the importance of using a Hash 1073 Function with good mixing properties. For this the statistical 1074 properties of candidate Hash Functions need to be evaluated. 1075 Since the hash output depends on the traffic mix, the evaluation 1076 should be done preferably on up-to-date packet traces from the 1077 network in which the hash-based selection will be deployed. 1079 However, hash functions which perform well on typical traffic 1080 may not be sufficiently strong to withstand attacks specifically 1081 targeted against them. Such potential attacks have been 1082 described in [GoRe07]. 1084 The following we point out different potential attack scenarios. 1085 We encourage the use of standardized hash functions. Therefore 1086 we assume that the hash function itself is public and hence 1087 known to an attacker. 1088 Nevertheless, we also assume the possibility of using a private 1089 input parameter for the hash function that is kept secret. Such 1090 an input parameter can for instance be attached to the hash 1091 input before the hash operation is applied. With this at least 1092 parts of the hash operation remains secret. 1094 For the attack scenarios we assume that an attacker uses its 1095 knowledge of the hash function to craft packets which are then 1096 dispatched, either as the attack itself, or to elicit further 1097 information which can be used to refine the attack. 1099 Two scenarios are considered. In the first scenario, the 1100 attacker has no knowledge about whether the crafted packets are 1101 selected or not. In the second scenario the attacker uses some 1102 knowledge of sampling outcomes. The means by which this might be 1103 acquired is discussed below. Some additional attacks that 1104 involve tampering with export packets in transit, as opposed to 1105 attacking the PSAMP device, are discussed in [GoRe07]. 1107 6.2.3.1 Vulnerabilities of Hash-based selection without knowledge 1108 of selection outcomes 1110 (i) The hash function does not use a private parameter. 1112 If no private input parameter is used, potential attackers can 1113 easily calculate which packets result in which hash values. 1114 If the selection range is public, an attacker can craft packets 1115 whose selection properties are known in advance. If the 1116 selection range is private, an attacker cannot determine whether 1117 a crafted packet is selected. However by computing the hash on 1118 different trial crafted packets, and selecting those yielding a 1119 given hash value, the attacker can construct an arbitrarily 1120 large set of distinct packets with a common selection 1121 properties, i.e., packets that will be either all selected or 1122 all not selected. This can be done whatever the strength of the 1123 hash function. 1125 (ii) The hash function is not cryptographically strong. 1127 If the hash function is not cryptographically strong, it may be 1128 possible to construct sequences of distinct packets with the 1129 common selection property even if a private parameter is used. 1131 An example is the standard CRC-32 hash function used with a 1132 private modulus (but without a private string post-pended to the 1133 input). It has weak mixing properties for low order bits. 1134 Consequently, simply by incrementing the hash input, one obtains 1135 distinct packets whose hashes mostly fall in a narrow range, and 1136 hence are likely commonly selected; see [GoRe07] 1138 Suitable parameterization of the hash function can make such 1139 attacks more difficult. For example, post-pending a private 1140 string to the input before hashing with CRC-32 will give 1141 stronger mixing properties over all bits of the input. However, 1142 with a hash function, such as CRC-32, that is not 1143 cryptographically strong, the possibility of discovering a 1144 method to construct packet sets with the common selected 1145 property cannot be ruled out, even when a private modulus or 1146 post-pended string is used. 1148 6.2.3.2 Vulnerabilities of Hash-based selection using knowledge of 1149 selection outcomes 1151 Knowledge of the selection outcomes of crafted packets can be 1152 used by an attacker to more easily construct sets of packets 1153 which are disproportionately sampled and/or are commonly 1154 selected. For this the attacker does not need any a priori 1155 knowledge about the hash function or selection range. 1157 There are several ways an attacker might acquire this knowledge 1158 about the selection outcome: 1160 (i) Billing Reports: if samples are used for billing purposes, 1161 then the selection outcomes of packets may be able to be 1162 inferred by correlating a crafted packet stream with the billing 1163 reports that it generates. However, the rate at knowledge of 1164 selection outcomes can be acquired depends on the temporal and 1165 spatial granularity of the billing reports, being slower the 1166 more aggregated the reports are. 1168 (ii) Feedback from an Intrusion Detection System: e.g., a 1169 botmaster adversary learns if his packets were detected by the 1170 intrusion detection system by seeing if one of his bots is 1171 blocked by the network. 1173 (iii) Observation of the Report Stream: export packets sent 1174 across a public network may be eavesdropped on by an adversary. 1175 Encryption of the export packets provides only a partial 1176 defense, since it may be possible to infer the selection 1177 outcomes of packets by correlating a crafted packet stream with 1178 the occurrence (not the content) of packets in the export stream 1179 that it generates. The rate at which such knowledge could be 1180 acquired is limited by the temporal resolution at which reports 1181 can be associated with packets, e.g. due to processing and 1182 propagation variability, and difficulty in distinguishing report 1183 on attack packets from those of background traffic, if present. 1184 The association between packets and their reports on which this 1185 depends could be removed by padding export packets to a constant 1186 length and sending them at a constant rate. 1188 We now turn to attacks that can exploit knowledge of selection 1189 outcomes. Firstly, with a non-cryptographic hash function, 1190 knowledge of selection outcomes for a trial stream may be used 1191 to further craft a packet set with the common selection 1192 property. This has been demonstrated for the modular hash f(x) = 1193 a x + b mod k, for private parameters a, b, and k. With sampling 1194 rate p, knowledge of the sampling outcomes of roughly 2/p is 1195 sufficient for the attack to succeed, independent of the values 1196 of a, b and k. With knowledge of the selection outcomes of a 1197 larger number of packets, the parameters a b and k can be 1198 determined; see [GoRe07]. 1200 A cryptographic hash function employing a private parameter and 1201 operating in one of the pseudo-random function modes specified 1202 above is not vulnerable to these attacks, even if the selection 1203 range is known. 1205 6.2.3.3 Vulnerabilities to Replay Attacks 1207 Since hash-based selection is deterministic, any packet or set 1208 of packets with known selection properties can be replayed into 1209 a network and experience the same selection outcomes provide the 1210 hash function and its parameters are not changed. Repetition of 1211 a single packet may be noticeable to other measurement methods 1212 if employed (e.g. collection of flow statistics), whereas a set 1213 of distinct packets that appears statistically similar to 1214 regular traffic may be less noticeable. 1216 Replay attacks may be mitigated by repeated changing of hash 1217 function parameters. This also prevents attacks that exploit 1218 knowledge of sampling outcomes, at least if the parameters are 1219 changed at least as fast as the knowledge can be acquired by an 1220 attacker. In order to preserve the ability to perform Trajectory 1221 Sampling, parameter changed would have to be simultaneous (or 1222 approximately so) across all observation point. 1224 6.2.4 Choice of Hash-Function 1226 The specific choice of hash function represents a trade-off 1227 between complexity and ease of implementation. Ideally, a 1228 cryptographically strong hash function employing a private 1229 parameter and operating in pseudo-random function mode as 1230 specified above would be used, yielding a good emulation a 1231 random packet selection at a target sampling rate, and giving 1232 maximal robustness against the attacks described in the previous 1233 section. Unfortunately there is currently no single hash 1234 function that fulfills all the requirements. 1236 As detailed in section 6.2.3, only cryptographic hash functions 1237 employing a private parameter operating in pseudo-random 1238 function mode are sufficiently strong to withstand the range of 1239 conceivable attacks. For example, fixed or variable length 1240 inputs could be hashed using a block cipher (like AES) in 1241 cipher-block-chaining mode. Fixed length inputs could also be 1242 hashed using an iterated cryptographic hash function (like MD5 1243 or SHA1), with a private initial vector. For variable length 1244 inputs, iterated cryptographic hash function (like MD5 or SHA1) 1245 should employ private string post-pended to the data in addition 1246 to a private initial vector. For more details, see the "append- 1247 cascade" construction of [BeCK96]. We encourage the use of such 1248 cryptographically strong hash function wherever possible. 1250 However, a problem with using such function is the low 1251 performance. As shown for instance in [Henk08], the computation 1252 time for MD5 and SHA are about 7-10 times higher compared to 1253 non-cryptographic functions. The difference increases for small 1254 hash input lengths. 1256 Therefore it is not assumed that all PSAMP devices will be 1257 capable of applying a cryptographically strong hash function to 1258 every packet at line rate. For this reason, the hash functions 1259 listed in this section will be of a weaker variety. Future 1260 protocol extensions that employ stronger hash functions are 1261 highly welcome. 1263 Comparisons of hash-functions for packet selection and packet 1264 digesting with regard to various criteria can be found in 1265 [MoND05] and [Henk08]. 1267 6.2.4.1 Hash Functions for Packet Selection 1269 If hash-based packet selection is applied, the BOB function MUST 1270 be used for packet selection operations in order to be compliant 1271 with PSAMP. The specification of BOB is given in the appendix. 1272 Both the parameter (the init value) and the selection range 1273 should be kept private. The initial vector of the hash function 1274 MUST be configurable out of band to prevent security breaches 1275 like exposure of the initial vector content. 1277 Other functions, such as CRC-32 and IPSX MAY be used. The IPSX 1278 function is described in the appendix, the CRC-32 function is 1279 described in [RFC1141]. If CRC-32 is used, the input should 1280 first be post-pended with a private string that acts as a 1281 parameter, and the modulus of the CRC should also be kept 1282 private. 1284 IPSX is simple to implement and was correspondingly about an 1285 order of magnitude faster to execute per packet than BOB or CRC- 1286 32 [MoND05]. 1288 All three hash functions evaluated showed relatively poor 1289 uniformity with 16 byte input that was drawn from only invariant 1290 fields in the IP and TCP/UDP headers (i.e. header fields that do 1291 not change from hop to hop). IPSX is inherently limited to 16 1292 bytes. 1293 BOB and CRC-32 exhibits noticeably better uniformity when 4 or 1294 more bytes from the payload are also included in the input 1295 [MoND05]. Also with other criteria BOB performed quite well 1296 [Henk08] 1298 Although the characteristics have been checked for different 1299 traffic traces, results cannot be generalized to arbitrary 1300 traffic. Since hash-based selection is a deterministic function 1301 on the packet content, it can always be biased towards packets 1302 with specific attributes. Furthermore, it should be noted that 1303 all Hash Functions were evaluated only for IPv4. 1305 None of these hash functions is recommended for cryptographic 1306 purposes. Please also note that the use of a private parameter 1307 only slightly reduces the vulnerabilities against attacks. As 1308 shown in section 6.2.3. functions that are not cryptographically 1309 strong (e.g., BOB and CRC) cannot prevent attackers from 1310 crafting packets that are disproportionally selected even if a 1311 private parameter is used and the selection range is kept 1312 secret. 1314 As described in section 6.2.2 the input bytes for the Hash 1315 Function need to be invariant along the path the packet is 1316 traveling. Only with this it is ensured that the same packets 1317 are selected at different observation points. Furthermore they 1318 should have a high variability between different packets to 1319 generate a high variation in the Hash Range. An evaluation of 1320 the variability of different packet header fields can be found 1321 in [DuGr00], [HeSZ08] and [Henk08]. 1323 If a hash-based selection with the BOB function is used with 1324 IPv4 traffic, the following input bytes MUST be used. 1325 - IP identification field 1326 - Flags field 1327 - Fragment offset 1328 - Source IP address 1329 - Destination IP address 1330 - A configurable number of bytes from the IP payload, starting 1331 at a configurable offset. 1333 Due to the lack of suitable IPv6 packet traces, all candidate 1334 Hash Functions in [DuGr00], [MoND05] and [Henk08] were evaluated 1335 only for IPv4. Due to the IPv6 header fields and address 1336 structure it is expected that there is less randomness in IPv6 1337 packet headers than in IPv4 headers. Nevertheless, the 1338 randomness of IPv6 traffic has not yet been evaluated 1339 sufficiently to get any evidence. In addition to this, IPv6 1340 traffic profiles may change significantly in future when IPv6 is 1341 used by a broader community. 1343 If a hash-based selection with the BOB function is used with 1344 IPv6 traffic, the following input bytes MUST be used. 1345 - Payload length (2 bytes) 1346 - Byte number 10,11,14,15,16 of the IPv6 source address 1347 - Byte number 10,11,14,15,16 of the IPv6 destination address 1348 - A configurable number of bytes from the IP payload, starting 1349 at a configurable offset. It is recommended to use at least 4 1350 bytes from the IP payload. 1352 The payload itself is not changing during the path. Even if some 1353 routers process some extension headers they are not going to 1354 strip them from the packet. Therefore the payload length is 1355 invariant along the path. Furthermore it usually differs for 1356 different packets. The IPv6 address has 16 bytes. The first part 1357 is the network part and it contains low variation. The second 1358 part is the host part and contains higher variation. Therefore 1359 the second part of the address is used. Nevertheless, the 1360 uniformity has not been checked for IPv6 traffic. 1362 6.2.4.2 Hash Functions Suitable for Packet Digesting 1364 For this purpose also the BOB function SHOULD be used. Other 1365 functions (such as CRC-32) MAY be used. Among the functions 1366 capable of operating with variable length input BOB and CRC-32 1367 have the fastest execution, BOB being slightly faster. IPSX is 1368 not recommended for digesting because it has a significantly 1369 higher collision rate and takes only a fixed length input. 1371 7. Parameters for the Description of Selection Techniques 1373 This section gives an overview of different alternative 1374 selection schemes and their required parameters. In order to be 1375 compliant with PSAMP at least one of proposed schemes MUST be 1376 implemented. 1378 The decision whether to select a packet or not is based on a 1379 function which is performed when the packet arrives at the 1380 selection process. Packet selection schemes differ in the input 1381 parameters for the selection process and the functions they 1382 require to do the packet selection. The following table gives an 1383 overview. 1385 Scheme | input parameters | functions 1386 ---------------+------------------------+------------------- 1387 systematic | packet position | packet counter 1388 count-based | Sampling pattern | 1389 ---------------+------------------------+------------------- 1390 systematic | arrival time | clock or timer 1391 time-based | Sampling pattern | 1392 ---------------+------------------------+------------------- 1393 random | packet position | packet counter, 1394 n-out-of-N | Sampling pattern | random numbers 1395 | (random number list) | 1396 ---------------+------------------------+------------------- 1397 uniform | Sampling | random function 1398 probabilistic | probability | 1399 ---------------+------------------------+------------------- 1400 non-uniform |e.g. packet position, | selection function, 1401 probabilistic | packet content(parts) | probability calc. 1402 ---------------+------------------------+------------------- 1403 non-uniform |e.g. flow state, | selection function, 1404 flow-state | packet content(parts) | probability calc. 1405 ---------------+------------------------+------------------- 1406 property | packet content(parts) | filter function or 1407 match | or router state | state discovery 1408 ---------------+------------------------+------------------- 1409 hash-based | packet content(parts) | Hash Function 1410 ---------------+------------------------+------------------- 1412 7.1 Description of Sampling Techniques 1414 In this section we define what elements are needed to describe 1415 the most common Sampling techniques. Here the selection function 1416 is pre-defined and given by the Selector ID. 1418 Sampler Description: 1419 SELECTOR_ID 1420 SELECTOR_TYPE 1421 SELECTOR_PARAMETERS 1423 Where: 1425 SELECTOR_ID: 1426 Unique ID for the packet sampler. 1428 SELECTOR_TYPE 1429 For Sampling processes the SELECTOR TYPE defines what Sampling 1430 algorithm is used. 1431 Values: Systematic Count-based | Systematic Time-based | Random 1432 n-out-of-N | Uniform Probabilistic | Non-uniform Probabilistic | 1433 Non-uniform Flow-state 1435 SELECTOR_PARAMETERS 1436 For Sampling processes the SELECTOR PARAMETERS define the input 1437 parameters for the process. Interval length in systematic 1438 Sampling means, that all packets that arrive in this interval 1439 are selected. The spacing parameter defines the spacing in time 1440 or number of packets between the end of one Sampling interval 1441 and the start of the next succeeding interval. 1443 Case n out of N: 1444 - Population size N, Sample size n 1446 Case Systematic Time Based: 1447 - Interval length (in usec), Spacing (in usec) 1449 Case Systematic Count Based: 1450 - Interval length(in packets), Spacing (in packets) 1452 Case Uniform Probabilistic (with equal probability per packet): 1453 - Sampling probability p 1455 Case Non-uniform Probabilistic: 1456 - Calculation function for Sampling probability p (see also 1457 section 5.2.2.4) 1459 Case flow state: 1460 - Information reported for flow state sampling are not 1461 defined in this document (see also section 5.2.2.4) 1463 7.2 Description of Filtering Techniques 1465 In this section we define what elements are needed to describe 1466 the most common Filtering techniques. The structure closely 1467 parallels the one presented for the Sampling techniques. 1469 Filter Description: 1470 SELECTOR_ID 1471 SELECTOR_TYPE 1472 SELECTOR_PARAMETERS 1474 Where: 1476 SELECTOR_ID: 1478 Unique ID for the packet filter. The ID can be calculated under 1479 consideration of the SELECTION SEQUENCE and a local ID. 1481 SELECTOR_TYPE 1482 For Filtering processes the SELECTOR TYPE defines what Filtering 1483 type is used. 1484 Values: Matching | Hashing | Router_state 1486 SELECTOR_PARAMETERS 1487 For Filtering processes the SELECTOR PARAMETERS define formally 1488 the common property of the packet being filtered. For the 1489 filters of type Matching and Hashing the definitions have a lot 1490 of points in common. 1492 Values: 1494 Case Matching 1495 - Information Element (from [RFC5102]) 1496 - Value (type in accordance to [RFC5102]) 1498 In case of multiple match criteria, multiple "case matching" 1499 have to be bound by a logical AND. 1501 Case Hashing: 1502 - Hash Domain (Input bits from packet) 1503 -
1504 - 1505 -
1506 - 1507 - 1508 - 1509 - Hash Function 1510 - Hash function name 1511 - Length of input key (eliminate 0x bytes) 1512 - Output value (length M and bitmask) 1513 - Hash Selection Range, as a list of non overlapping 1514 intervals [start value, end value] where value is in 1515 [0,2^M-1] 1516 - Additional parameters dependent on specific Hash 1517 Function (e.g. hash input bits (seed)) 1519 Notes to input bits for Case Hashing: 1520 - Input bits can be from header part only, from the payload 1521 part only or from both. 1522 - The bit specification, for the header part, can be 1523 specified for IPv4 or IPv6 only, or both 1525 - In case of IPv4, the bit specification is a sequence of 20 1526 Hexadecimal numbers [00,FF] specifying a 20 bytes bitmask 1527 to be applied to the header. 1528 - In case of IPv6, it is a sequence of 40 Hexadecimal numbers 1529 [00,FF] specifying a 40 bytes bitmask to be applied to the 1530 header 1531 - The bit specification, for the payload part, is a sequence 1532 of Hexadecimal numbers [00,FF] specifying the bitmask to be 1533 applied to the first N bytes of the payload, as specified 1534 by the previous field. In case the Hexadecimal number 1535 sequence is longer than N, only the first N numbers are 1536 considered. 1537 - In case the payload is shorter than N, the Hash Function 1538 cannot be applied. Other options, like padding with zeros, 1539 may be considered in the future. 1540 - A Hash Function cannot be defined on the options field of 1541 the IPv4 header, neither on stacked headers of IPv6. 1542 - The Hash Selection Range defines a range of hash-values 1543 (out of all possible results of the Hash-Operation). If the 1544 hash result for a specific packet falls in this range, the 1545 packet is selected. If the value is outside the range, the 1546 packet is not selected. E.g. if the selection interval 1547 specification is [1:3], [6:9] all packets are selected for 1548 which the hash result is 1,2,3,6,7,8, or 9. In all other 1549 cases the packet is not selected. 1551 Case Router State: 1553 - Ingress interface at which the packet arrives equals a 1554 specified value 1555 - Egress interface to which the packet is routed equals a 1556 specified value 1557 - Packet violated Access Control List (ACL) on the router 1558 - Reverse Path Forwarding (RPF) failed for the packet 1559 - Resource Reservation is insufficient for the packet 1560 - No route found for the packet 1561 - Origin AS equals a specified value or lies within a given 1562 range 1563 - Destination AS equals a specified value or lies within a 1564 given range 1566 Note to Case Router State: 1567 - All Router state entries can be linked by AND operators 1569 8. Composite Techniques 1571 Composite schemes are realized by combining the selector IDs 1572 into a Selection Sequence. The Selection Sequence contains all 1573 selector IDs that are applied to the packet stream subsequently. 1574 Some examples of composite schemes are reported below. 1576 8.1 Cascaded Filtering->Sampling or Sampling->Filtering 1578 If a filter precedes a Sampling process the role of Filtering is 1579 to create a set of "parent populations" from a single stream 1580 that can then be fed independently to different Sampling 1581 functions, with different parameters tuned for the population 1582 itself (e.g. if streams of different intensity result from 1583 Filtering, it may be good to have different Sampling rates). If 1584 Filtering follows a Sampling process, the same Sampling Fraction 1585 and type is applied to the whole stream, independently of the 1586 relative size of the streams resulting from the Filtering 1587 function. Moreover, also packets not destined to be selected in 1588 the Filtering operation will "load" the Sampling function. So, 1589 in principle, Filtering before Sampling allows a more accurate 1590 tuning of the Sampling procedure, but if filters are too complex 1591 to work at full line rate (e.g. because they have to access 1592 router state information), Sampling before Filtering may be a 1593 need. 1595 8.2 Stratified Sampling 1597 Stratified Sampling is one example for using a composite 1598 technique. The basic idea behind stratified Sampling is to 1599 increase the estimation accuracy by using a-priori information 1600 about correlations of the investigated characteristic with some 1601 other characteristic that is easier to obtain. The a-priori 1602 information is used to perform an intelligent grouping of the 1603 elements of the parent population. In this manner, a higher 1604 estimation accuracy can be achieved with the same Sample Size or 1605 the Sample Size can be reduced without reducing the estimation 1606 accuracy. 1608 Stratified Sampling divides the Sampling process into multiple 1609 steps. First, the elements of the parent population are grouped 1610 into subsets in accordance to a given characteristic. This 1611 grouping can be done in multiple steps. Then samples are taken 1612 from each subset. 1614 The stronger the correlation between the characteristic used to 1615 divide the parent population (stratification variable) and the 1616 characteristic of interest (for which an estimate is sought 1617 after), the easier is the consecutive Sampling process and the 1618 higher is the stratification gain. For instance, if the dividing 1619 characteristic were equal to the investigated characteristic, 1620 each element of the sub-group would be a perfect representative 1621 of that characteristic. In this case it would be sufficient to 1622 take one arbitrary element out of each subgroup to get the 1623 actual distribution of the characteristic in the parent 1624 population. Therefore stratified Sampling can reduce the costs 1625 for the Sampling process (i.e. the number of samples needed to 1626 achieve a given level of confidence). 1628 For stratified Sampling one has to specify classification rules 1629 for grouping the elements into subgroups and the Sampling scheme 1630 that is used within the subgroups. The classification rules can 1631 be expressed by multiple filters. For the Sampling scheme within 1632 the subgroups the parameters have to be specified as described 1633 above. The use of stratified Sampling methods for measurement 1634 purposes is described for instance in [ClPB93] and [Zseb03]. 1636 9. Security Considerations 1638 Security considerations concerning the choice of sampling hash 1639 function have been discussed in Section 6.2.2. That section 1640 discussed a number of potential attacks to craft packet streams 1641 which are disproportionately detected and/or discover the hash 1642 function parameters, the vulnerabilities of different hash 1643 functions to these attacks, and practices to minimize these 1644 vulnerabilities. 1646 In addition to this a user can gains knowledge about the start 1647 and stop triggers in time-based systematic sampling e.g. by 1648 sending test packets. This knowledge might allow users to modify 1649 their send schedule in a way that their packets are 1650 disproportionately selected or not selected [GoRe07]. 1652 For random sampling cryptographically-strong random number 1653 generator should be used in order to prevent that an advisory 1654 can predict the selection decision [GoRe07]. 1656 Further security threats can occur when sampling parameters are 1657 configured or communicated to other entities. The configuration 1658 and reporting of sampling parameters are out of scope of this 1659 document. Therefore the security threats that originate from 1660 this kind of communication cannot be assessed with the 1661 information given in this document. 1663 Some of these threats can probably be addressed by keeping 1664 configuration information confidential and by authenticating 1665 entities that configure sampling. Nevertheless a full analysis 1666 and assessment of threats for configuration and reporting has to 1667 be done if configuration or reporting methods are proposed. 1669 10. Acknowledgements 1671 We would like to thank the PSAMP group, especially Benoit Claise 1672 and Stewart Bryant, for fruitful discussions and for 1673 proofreading the document. We thank Sharon Goldberg for her 1674 input on security issues concerning hash-based selection. 1676 11. IANA Considerations 1678 This document has no actions for IANA. 1680 12. Normative References 1682 [RFC2119] Bradner, S., Key words for use in RFCs to Indicate 1683 Requirement Levels, BCP 14, RFC 2119, March 1997 1685 13. Informative References 1687 [AmCa89] Paul D. Amer, Lillian N. Cassel, "Management of 1688 Sampled Real-Time Network Measurements", 14th 1689 Conference on Local Computer Networks, October 1690 1989, Minneapolis, pages 62-68, IEEE, 1989. 1692 [BeCK96] M. Bellare, R. Canetti and H. Krawczyk, 1693 "Pseudorandom Functions Revisited: The Cascade 1694 Construction and its Concrete Security", Symposium 1695 on Foundations of Computer Science, 1996. 1697 [ClPB93] K.C. Claffy, George C. Polyzos, Hans-Werner Braun, 1698 "Application of Sampling Methodologies to Network 1699 Traffic Characterization", Proceedings of ACM 1700 SIGCOMM'93, San Francisco, CA, USA, September 13 - 1701 17, 1993. 1703 [DuGG02] N.G. Duffield, A. Gerber, M. Grossglauser, 1704 "Trajectory Engine: A Backend for Trajectory 1705 Sampling", IEEE Network Operations and Management 1706 Symposium 2002, Florence, Italy, April 15-19, 2002. 1708 [DuGr00] N.G. Duffield, M. Grossglauser, "Trajectory 1709 Sampling for Direct Traffic Observation", 1710 Proceedings of ACM SIGCOMM 2000, Stockholm, Sweden, 1711 August 28 - September 1, 2000. 1713 [DuGr04] N. G. Duffield and M. Grossglauser "Trajectory 1714 Sampling with Unreliable Reporting", Proc IEEE 1715 Infocom 2004, Hong Kong, March 2004. 1717 [DuLT01] N.G. Duffield, C. Lund, and M. Thorup, "Charging 1718 from Sampled Network Usage", ACM Internet 1719 Measurement Workshop IMW 2001, San Francisco, USA, 1720 November 1-2, 2001. 1722 [EsVa01] C. Estan and G. Varghese, "New Directions in 1723 Traffic Measurement and Accounting", ACM SIGCOMM 1724 Internet Measurement Workshop 2001, San Francisco 1725 (CA) Nov. 2001. 1727 [GoRe07] S. Goldberg, J. Rexford, "Security Vulnerabilities 1728 and Solutions for Packet Sampling", IEEE Sarnoff 1729 Symposium, Princeton, NJ, May 2007. 1731 [HT52] D.G. Horvitz and D.J. Thompson, "A Generalization 1732 of Sampling without replacement from a Finite 1733 Universe" J. Amer. Statist. Assoc. Vol. 47, pp. 1734 663-685, 1952. 1736 [Henk08] Christian Henke, Evaluation of Hash Functions for 1737 Multipoint Sampling in IP Networks, Diploma Thesis, 1738 TU Berlin, April 2008. 1740 [HeSZ08] Christian Henke, Carsten Schmoll, Tanja Zseby, 1741 Evaluation of Header Field Entropy for Hash-Based 1742 Packet Selection, Proceedings of Passive and Active 1743 Measurement Conference PAM 2008, Cleveland, Ohio, 1744 USA, April 2008. 1746 [RFC5102] J. Quittek, S. Bryant, B. Claise, P. Aitken, J. 1747 Meyer, "Information Model for IP Flow Information 1748 Export", RFC 5102, January 2008. 1750 [RFC5101] B. Claise (Editor) "Specification of the IPFIX 1751 Protocol for the Exchange of IP Traffic Flow 1752 Information", RFC 5101, January 2008. 1754 [Jenk97] B. Jenkins, "Algorithm Alley", Dr. Dobb's Journal, 1755 September 1997. 1756 http://burtleburtle.net/bob/hash/doobs.html 1758 [JePP92] Jonathan Jedwab, Peter Phaal, Bob Pinna, "Traffic 1759 Estimation for the Largest Sources on a Network, 1760 Using Packet Sampling with Limited Storage", HP 1761 technical report, Managemenr, Mathematics and 1762 Security Department, HP Laboratories, Bristol, 1763 March 1992, 1764 http://www.hpl.hp.com/techreports/92/HPL-92-35.html 1766 [Moli03] M.Molina, "A scalable and efficient methodology for 1767 flow monitoring in the Internet", International 1768 Teletraffic Congress (ITC-18), Berlin, Sep. 2003 1770 [MoND05] M. Molina, S.Niccolini, N.G.Duffield "A Comparative 1771 Experimental Study of Hash Functions Applied to 1772 Packet Sampling" International Teletraffic Congress 1773 (ITC-19), Beijing, August 2005. 1775 [PSAMP-FW] Nick Duffield (Ed.), "A Framework for Packet 1776 Selection and Reporting", RFC XXXX [currently 1777 Internet Draft draft-ietf-psamp-framework-11, work 1778 in progress, May 2007]. 1780 [PSAMP-INFO] T. Dietz, F. Dressler, G. Carle, B. Claise, 1781 "Information Model for Packet Sampling Exports", 1782 RFC XXXX. [Currently Internet Draft, draft-ietf- 1783 psamp-info-06, June 2007] 1785 [PSAMP-PROTO] B. Claise (Ed.), "Packet Sampling (PSAMP) Protocol 1786 Specifications", RFC XXXX. [Currently Internet 1787 Draft draft-ietf-psamp-protocol-07.txt, work in 1788 progress, October 2006]. 1790 [RFC1141] T. Mallory, A. Kullberg, "Incremental Updating of 1791 the Internet Checksum", RFC 1141, January 1990 1792 (updated by RFC1624). 1794 [RFC1624] A. Rijsinghani, Computation of the Internet 1795 Checksum via Incremental Update, RFC1624, May 1994 1797 [RFC2205] R. Braden (Ed.), L. Zhang, S. Berson, S. Herzog, S. 1798 Jamin, Resource ReSerVation Protocol (RSVP) - 1799 Version 1 Functional Specification, RFC2205, 1800 September 1997 1802 [RFC3704] F. Baker, P. Savola, Ingress Filtering for 1803 Multihomed Networks, RFC3704, March 2004 1805 [RFC3917] J. Quittek, T. Zseby, B. Claise, S. Zander, 1806 "Requirements for IP Flow Information Export", RFC 1807 3917, October 2004. 1809 [RFC4271] Y. Rekhter, T. Li, S. Hares, "A Border Gateway 1810 Protocol 4 (BGP-4)", RFC 4271, January 2006. 1812 [Zseb03] T. Zseby, "Stratification Strategies for Sampling- 1813 based Non-intrusive Measurement of One-way Delay", 1814 Proceedings of Passive and Active Measurement 1815 Workshop (PAM 2003), La Jolla, CA, USA, pp. 171- 1816 179, April 2003. 1818 [ZsZC01] Tanja Zseby, Sebastian Zander, Georg Carle. 1819 Evaluation of Building Blocks for Passive One-way- 1820 delay Measurements. Proceedings of Passive and 1821 Active Measurement Workshop (PAM 2001), Amsterdam, 1822 The Netherlands, April 23-24, 2001. 1824 14. Authors' Addresses 1826 Tanja Zseby 1827 Fraunhofer Institute for Open Communication Systems 1828 Kaiserin-Augusta-Allee 31 1829 10589 Berlin 1830 Germany 1831 Phone: +49-30-34 63 7153 1832 Email: tanja.zseby@fokus.fraunhofer.de 1834 Maurizio Molina 1835 DANTE 1836 City House 1837 126-130 Hills Road 1838 Cambridge CB21PQ 1839 United Kingdom 1840 Phone: +44 1223 371 300 1841 Email: maurizio.molina@dante.org.uk 1843 Nick Duffield 1844 AT&T Labs - Research 1845 Room B-139 1846 180 Park Ave 1847 Florham Park NJ 07932, USA 1848 Phone: +1 973-360-8726 1849 Email: duffield@research.att.com 1851 Saverio Niccolini 1852 Network Laboratories, NEC Europe Ltd. 1853 Kurfuerstenanlage 36 1854 69115 Heidelberg 1855 Germany 1856 Phone: +49-6221-9051118 1857 Email: saverio.niccolini@netlab.nec.de 1859 Fredric Raspall 1860 EPSC-UPC 1861 Dept. of Telematics 1862 Av. del Canal Olimpic, s/n 1863 Edifici C4 1864 E-08860 Castelldefels, Barcelona 1865 Spain 1866 Email: fredi@entel.upc.es 1868 15. Contributors 1870 Sharon Goldberg contributed to the security considerations 1871 for hash-based selection. 1873 Sharon Goldberg 1874 Department of Electrical Engineering 1875 Princeton University 1876 F210-K EQuad 1877 Princeton, NJ 08544, USA 1878 Email: goldbe@princeton.edu 1880 16. Intellectual Property Statement 1882 The IETF has been notified of intellectual property rights 1883 claimed in regard to some or all of the specification contained 1884 in this document. For more information consult the online list 1885 of claimed rights. 1887 The IETF takes no position regarding the validity or scope of 1888 any Intellectual Property Rights or other rights that might be 1889 claimed to pertain to the implementation or use of the 1890 technology described in this document or the extent to which any 1891 license under such rights might or might not be available; nor 1892 does it represent that it has made any independent effort to 1893 identify any such rights. Information on the procedures with 1894 respect to rights in RFC documents can be found in BCP 78 and 1895 BCP 79. 1897 Copies of IPR disclosures made to the IETF Secretariat and any 1898 assurances of licenses to be made available, or the result of an 1899 attempt made to obtain a general license or permission for the 1900 use of such proprietary rights by implementers or users of this 1901 specification can be obtained from the IETF on-line IPR 1902 repository at http://www.ietf.org/ipr. 1904 The IETF invites any interested party to bring to its attention 1905 any copyrights, patents or patent applications, or other 1906 proprietary rights that may cover technology that may be 1907 required to implement this standard. Please address the 1908 information to the IETF at ietf-ipr@ietf.org. 1910 17. Copyright Statement 1912 Copyright (C) The IETF Trust (2008). 1914 This document is subject to the rights, licenses and 1915 restrictions contained in BCP 78, and except as set forth 1916 therein, the authors retain all their rights. 1918 18. Disclaimer 1920 This document and the information contained herein are provided 1921 on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 1922 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, 1923 THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM 1924 ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO 1925 ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 1926 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY 1927 OR FITNESS FOR A PARTICULAR PURPOSE. 1929 Appendix A: Hash Functions 1931 A.1 IP Shift-XOR (IPSX) Hash Function 1933 The IPSX Hash Function is tailored for acting on IP version 4 1934 packets. It exploits the structure of IP packet and in 1935 particular the variability expected to be exhibited within 1936 different fields of the IP packet in order to furnish a hash 1937 value with little apparent correlation with individual packet 1938 fields. Fields from the IPv4 and TCP/UDP headers are used as 1939 input. The IPSX Hash Function uses a small number of simple 1940 instructions. 1942 Input parameters: None 1944 Built-in parameters: None 1946 Output: The output of the IPSX is a 16 bit number 1948 Functioning: 1950 The functioning can be divided into two parts: input selection, 1951 which forms are composite input from various portions of the IP 1952 packet, followed by computation of the hash on the composite. 1954 Input Selection: 1955 The raw input is drawn from the first 20 bytes of the IP packet 1956 header and the first 8 bytes of the IP payload. If IP options 1957 are not used, the IP header has 20 bytes, and hence the two 1958 portions adjoin and comprise the first 28 bytes of the IP 1959 packet. We now use the raw input as 4 32-bit subportions of 1960 these 28 bytes. We specify the input by bit offsets from the 1961 start of IP header or payload. 1963 f1 = bits 32 to 63 of the IP header, comprising the IP 1964 identification field, flags, and fragment offset. 1966 f2 = bits 96 to 127 of the IP header, the source IP address. 1968 f3 = bits 128 to 159 of the IP header, the destination IP 1969 address. 1971 f4 = bits 32 to 63 of the IP payload. For a TCP packet, f4 1972 comprises the TCP sequence number followed by the message 1973 length. For a UDP packet f4 comprises the UDP checksum. 1975 Hash Computation: 1976 The hash is computed from f1, f2, f3 and f4 by a combination of 1977 XOR (^), right shift (>>) and left shift (<<) operations. The 1978 intermediate quantities h1, v1, v2 are 32-bit numbers. 1980 1. v1 = f1 ^ f2; 1981 2. v2 = f3 ^ f4; 1982 3. h1 = v1 << 8; 1983 4. h1 ^= v1 >> 4; 1984 5. h1 ^= v1 >> 12; 1985 6. h1 ^= v1 >> 16; 1986 7. h1 ^= v2 << 6; 1987 8. h1 ^= v2 << 10; 1988 9. h1 ^= v2 << 14; 1989 10. h1 ^= v2 >> 7; 1991 The output of the hash is the least significant 16 bits of h1. 1993 A.2 BOB Hash Function 1995 The BOB Hash Function is a Hash Function designed for having 1996 each bit of the input affecting every bit of the return value 1997 and using both 1-bit and 2-bit deltas to achieve the so called 1998 avalanche effect [Jenk97]. This function was originally built 1999 for hash table lookup with fast software implementation. 2001 Input Parameters: 2002 The input parameters of such a function are: 2003 - the length of the input string (key) to be hashed, in bytes. 2004 The elementary input blocks of Bob hash are the single bytes, 2005 therefore no padding is needed. 2006 - an init value (an arbitrary 32-bit number). 2008 Built in parameters: 2009 The Bob Hash uses the following built-in parameter: 2010 - the golden ratio (an arbitrary 32-bit number used in the hash 2011 function computation: its purpose is to avoid mapping all zeros 2012 to all zeros); 2014 Note: the mix sub-function (see mix (a,b,c) macro in the 2015 reference code in 3.2.4) has a number of parameters governing 2016 the shifts in the registers. The one presented is not the only 2017 possible choice. 2019 It is an open point whether these may be considered additional 2020 built-in parameters to specify at function configuration. 2022 Output. 2023 The output of the BOB function is a 32-bit number. It should be 2024 specified: 2025 - A 32 bit mask to apply to the output 2026 - The selection range as a list of non overlapping intervals 2027 [start value, end value] where value is in [0,2^32] 2029 Functioning: 2030 The hash value is obtained computing first an initialization of 2031 an internal state (composed of 3 32-bit numbers, called a, b, c 2032 in the reference code below), then, for each input byte of the 2033 key the internal state is combined by addition and mixed using 2034 the mix sub-function. Finally, the internal state mixed one last 2035 time and the third number of the state (c) is chosen as the 2036 return value. 2038 typedef unsigned long int ub4; /* unsigned 4-byte quantities 2039 */ 2040 typedef unsigned char ub1; /* unsigned 1-byte quantities 2041 */ 2043 #define hashsize(n) ((ub4)1<<(n)) 2044 #define hashmask(n) (hashsize(n)-1) 2045 /* ------------------------------------------------------ 2046 mix -- mix 3 32-bit values reversibly. 2047 For every delta with one or two bits set, and the deltas of 2048 all three high bits or all three low bits, whether the original 2049 value of a,b,c is almost all zero or is uniformly distributed, 2050 * If mix() is run forward or backward, at least 32 bits in 2051 a,b,c have at least 1/4 probability of changing. 2052 * If mix() is run forward, every bit of c will change between 2053 1/3 and 2/3 of the time. (Well, 22/100 and 78/100 for some 2- 2054 bit deltas.) mix() was built out of 36 single-cycle latency 2055 instructions in a structure that could supported 2x parallelism, 2056 like so: 2057 a -= b; 2058 a -= c; x = (c>>13); 2059 b -= c; a ^= x; 2060 b -= a; x = (a<<8); 2061 c -= a; b ^= x; 2062 c -= b; x = (b>>13); 2063 ... 2064 Unfortunately, superscalar Pentiums and Sparcs can't take 2065 advantage of that parallelism. They've also turned some of 2066 those single-cycle latency instructions into multi-cycle latency 2067 instructions 2069 ------------------------------------------------------------*/ 2071 #define mix(a,b,c) \ 2072 { \ 2073 a -= b; a -= c; a ^= (c>>13); \ 2074 b -= c; b -= a; b ^= (a<<8); \ 2075 c -= a; c -= b; c ^= (b>>13); \ 2076 a -= b; a -= c; a ^= (c>>12); \ 2077 b -= c; b -= a; b ^= (a<<16); \ 2078 c -= a; c -= b; c ^= (b>>5); \ 2079 a -= b; a -= c; a ^= (c>>3); \ 2080 b -= c; b -= a; b ^= (a<<10); \ 2081 c -= a; c -= b; c ^= (b>>15); \ 2082 } 2084 /* ----------------------------------------------------------- 2085 hash() -- hash a variable-length key into a 32-bit value 2086 k : the key (the unaligned variable-length array of bytes) 2087 len : the length of the key, counting by bytes 2088 initval : can be any 4-byte value 2089 Returns a 32-bit value. Every bit of the key affects every bit 2090 of the return value. Every 1-bit and 2-bit delta achieves 2091 avalanche. About 6*len+35 instructions. 2093 The best hash table sizes are powers of 2. There is no need to 2094 do mod a prime (mod is sooo slow!). If you need less than 32 2095 bits, use a bitmask. For example, if you need only 10 bits, do 2096 h = (h & hashmask(10)); 2097 In which case, the hash table should have hashsize(10) elements. 2099 If you are hashing n strings (ub1 **)k, do it like this: 2100 for (i=0, h=0; i= 12) 2126 { 2127 a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) 2128 +((ub4)k[3]<<24)); 2129 b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) 2130 +((ub4)k[7]<<24)); 2131 c += (k[8] +((ub4)k[9]<<8) 2132 +((ub4)k[10]<<16)+((ub4)k[11]<<24)); 2133 mix(a,b,c); 2134 k += 12; len -= 12; 2135 } 2137 /*---------------------------- handle the last 11 bytes */ 2138 c += length; 2139 switch(len) /* all the case statements fall through*/ 2140 { 2141 case 11: c+=((ub4)k[10]<<24); 2142 case 10: c+=((ub4)k[9]<<16); 2143 case 9 : c+=((ub4)k[8]<<8); 2144 /* the first byte of c is reserved for the length */ 2145 case 8 : b+=((ub4)k[7]<<24); 2146 case 7 : b+=((ub4)k[6]<<16); 2147 case 6 : b+=((ub4)k[5]<<8); 2148 case 5 : b+=k[4]; 2149 case 4 : a+=((ub4)k[3]<<24); 2150 case 3 : a+=((ub4)k[2]<<16); 2151 case 2 : a+=((ub4)k[1]<<8); 2152 case 1 : a+=k[0]; 2153 /* case 0: nothing left to add */ 2154 } 2155 mix(a,b,c); 2156 /*-------------------------------- report the result */ 2157 return c; 2158 }