idnits 2.17.1 draft-ietf-psamp-sample-tech-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 24. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1597. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1566. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1573. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1579. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 10 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 947: '...ented, the BOB function SHOULD be used...' RFC 2119 keyword, line 948: '...tions. Other functions, like IPSX, MAY...' RFC 2119 keyword, line 965: '...he following input bytes MUST be used....' RFC 2119 keyword, line 990: '...he following input bytes MUST be used....' RFC 2119 keyword, line 1021: '...(such as CRC-32) MAY be used. Among th...' (1 more instance...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 664 has weird spacing: '...uniform proba...' == Line 696 has weird spacing: '...Uniform is cl...' == Line 1042 has weird spacing: '...a given range...' == Line 1464 has weird spacing: '...ampling with...' == Line 1705 has weird spacing: '...typedef unsig...' == (1 more instance...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2005) is 6831 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'PSAMP-TECH' is mentioned on line 151, but not defined -- Looks like a reference, but probably isn't: '1' on line 1795 == Missing Reference: 'N' is mentioned on line 649, but not defined -- Looks like a reference, but probably isn't: '0' on line 1820 == Missing Reference: 'DuGr01' is mentioned on line 875, but not defined -- Looks like a reference, but probably isn't: '00' on line 1233 == Missing Reference: 'FF' is mentioned on line 1233, but not defined -- Looks like a reference, but probably isn't: '2' on line 1795 -- Looks like a reference, but probably isn't: '4' on line 1816 -- Looks like a reference, but probably isn't: '5' on line 1797 -- Looks like a reference, but probably isn't: '6' on line 1797 -- Looks like a reference, but probably isn't: '8' on line 1799 -- Looks like a reference, but probably isn't: '9' on line 1799 == Unused Reference: 'CoGi98' is defined on line 1430, but no explicit reference was found in the text == Outdated reference: A later version (-13) exists of draft-ietf-psamp-framework-08 ** Downref: Normative reference to an Informational draft: draft-ietf-psamp-framework (ref. 'PSAMP-FW') == Outdated reference: A later version (-06) exists of draft-ietf-psamp-mib-03 -- Possible downref: Normative reference to a draft: ref. 'PSAMP-MIB' == Outdated reference: A later version (-09) exists of draft-ietf-psamp-protocol-01 == Outdated reference: A later version (-11) exists of draft-ietf-psamp-info-02 == Outdated reference: A later version (-15) exists of draft-ietf-ipfix-info-04 ** Downref: Normative reference to an Informational RFC: RFC 3917 (ref. 'IPFIX-REQ') -- Obsolete informational reference (is this intentional?): RFC 1771 (Obsoleted by RFC 4271) Summary: 7 errors (**), 0 flaws (~~), 19 warnings (==), 19 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft 3 Document: T. Zseby 4 Expires: January 2006 Fraunhofer FOKUS 5 M. Molina 6 DANTE 7 N. Duffield 8 AT&T Labs-Research 9 S. Niccolini 10 NEC Europe Ltd. 11 F. Raspall 12 EPSC-UPC 14 July 2005 16 Sampling and Filtering Techniques for IP Packet Selection 18 Status of this Memo 20 By submitting this Internet-Draft, each author represents that 21 any applicable patent or other IPR claims of which he or she is 22 aware have been or will be disclosed, and any of which he or she 23 becomes aware will be disclosed, in accordance with Section 6 of 24 BCP 79. 26 Internet-Drafts are working documents of the Internet 27 Engineering Task Force (IETF), its areas, and its working 28 groups. Note that other groups may also distribute working 29 documents as Internet-Drafts. 30 Internet-Drafts are draft documents valid for a maximum of six 31 months and may be updated, replaced, or obsoleted by other 32 documents at any time. It is inappropriate to use Internet- 33 Drafts as reference material or to cite them other than as "work 34 in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt. 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html. 42 Copyright Notice 44 Copyright (C) The Internet Society (2005). 46 Abstract 47 This document describes Sampling and Filtering techniques for IP 48 packet selection. It provides a categorization of schemes and 49 defines what parameters are needed to describe the most common 50 selection schemes. Furthermore it shows how techniques can be 51 combined to build more elaborate packet Selectors. The document 52 provides the basis for the definition of information models for 53 configuring selection techniques in Measurement Processes and 54 for reporting the technique in use to a Collector. 56 Table of Contents 58 1. Introduction.................................................4 59 2. PSAMP Documents Overview.....................................4 60 3. Terminology..................................................5 61 3.1 Observation Points, Packet Streams and Packet Content.....5 62 3.2 Selection Process.........................................6 63 3.3 Reporting Process.........................................7 64 3.4 Measurement Process.......................................8 65 3.5 Exporting Process.........................................8 66 3.6 PSAMP Device..............................................9 67 3.7 Collector.................................................9 68 3.8 Selection Methods.........................................9 69 4. Categorization of Packet Selection Techniques...............11 70 5. Sampling....................................................13 71 5.1 Systematic Sampling......................................14 72 5.2 Random Sampling..........................................15 73 5.2.1 n-out-of-N Sampling......................................15 74 5.2.2 Probabilistic Sampling...................................15 75 5.2.2.1 Uniform Probabilistic Sampling...........................15 76 5.2.2.2 Non-Uniform Probabilistic Sampling.......................16 77 5.2.2.3 Non-Uniform Flow State Dependent Sampling................16 78 5.2.2.4 Configuration of non-uniform probabilistic and flow- 79 state Sampling..........................................17 80 6. Filtering...................................................17 81 6.1 Field Match Filtering....................................17 82 6.2 Hash-based Filtering.....................................18 83 6.2.1 Application Examples for Hash-based Selection............18 84 6.2.1.1 Approximation of Random Sampling.........................18 85 6.2.1.2 Trajectory Sampling and Consistent packet selection......19 86 6.2.2 Guarding Against Pitfalls and Vulnerabilities............20 87 6.2.3 Recommended Hash Functions...............................20 88 6.2.3.1 Hash Functions Suitable for Packet Selection.............21 89 6.2.3.2 Input Bytes for the BOB Hash Function for IPv4...........21 90 6.2.3.3 Input Bytes for the BOB Hash Function for IPv6...........22 91 6.2.3.4 Hash Functions Suitable for Packet Digesting.............22 92 6.3 Router State Filtering...................................23 93 7. Parameters for the Description of Selection Techniques......23 94 7.1 Description of Sampling Techniques.......................24 95 7.2 Description of Filtering Techniques......................26 96 8. Composite Techniques........................................28 97 8.1 Cascaded Filtering->Sampling or Sampling->Filtering......28 98 8.2 Stratified Sampling......................................29 99 9. Security Considerations.....................................29 100 10. Acknowledgements............................................30 101 11. Normative References........................................30 102 12. Informative References......................................31 103 13. Authors' Addresses..........................................33 104 14. Intellectual Property Statement.............................33 105 15. Copyright Statement.........................................34 106 16. Disclaimer..................................................34 107 17. Appendix: Hash Functions....................................34 108 17.1 IP Shift-XOR (IPSX) Hash Function........................35 109 17.2 BOB Hash Function........................................36 111 1. Introduction 113 There are two main drivers for the growth in measurement 114 infrastructures and their underlying technology. First, network 115 data rates are increasing, with a concomitant growth in 116 measurement data. Secondly, the growth is compounded by the 117 demand by measurement-based applications for increasingly fine 118 grained traffic measurements. Devices such as routers, which 119 perform the measurements, require increasingly sophisticated and 120 resource intensive measurement capabilities, including the 121 capture of packet headers or even parts of the payload, and 122 classification for flow analysis. All these factors can lead to 123 an overwhelming amount of measurement data, resulting in high 124 demands on resources for measurement, storage, transport and 125 post processing. 127 The sustained capture of network traffic at line rate can be 128 performed by specialized measurement hardware. However, the cost 129 of the hardware and the measurement infrastructure required to 130 accommodate the measurements preclude this as a ubiquitous 131 approach. Instead some form of data reduction at the point of 132 measurement is necessary. This can be achieved by an intelligent 133 packet selection through Sampling, Filtering, or aggregation. 134 The motivation for Sampling is to select a representative subset 135 of packets that allow accurate estimates of properties of the 136 unsampled traffic to be formed. The motivation for Filtering is 137 to remove all packets that are not of interest. Aggregation 138 combines data and allows compact pre-defined views of the 139 traffic. Examples of applications that benefit from packet 140 selection are given in [PSAMP-FW]. Aggregation techniques are 141 out of scope of this document. 143 2. PSAMP Documents Overview 145 [PSAMP-FW]: "A Framework for Packet Selection and Reporting" 146 describes the PSAMP framework for network elements 147 to select subsets of packets by statistical and 148 other methods, and to export a stream of reports 149 on the selected packets to a Collector. 151 [PSAMP-TECH]: "Sampling and Filtering Techniques for IP Packet 152 Selection" (this document) describes the set of 153 packet selection techniques supported by PSAMP. 155 [PSAMP-MIB]: "Definitions of Managed Objects for Packet 156 Sampling" describes the PSAMP Management 157 Information Base. 159 [PSAMP-PROTO]: "Packet Sampling (PSAMP) Protocol Specifications" 160 specifies the export of packet information from a 161 PSAMP Exporting Process to a PSAMP Colleting 162 Process. 164 [PSAMP-INFO]: "Information Model for Packet Sampling Exports" 165 defines an information and data model for PSAMP. 167 3. Terminology 169 The PSAMP terminology defined here is fully consistent with all 170 terms listed in [PSAMP-FW] but includes additional terms 171 required for the description of packet selection methods. An 172 architecture overview and possible configurations of PSAMP 173 elements can be found in [PSAMP-FW]. PSAMP terminology also aims 174 to be consistent with terms used in [IPFIX-REQ]. The 175 relationship between some PSAMP and IPFIX terms is described in 176 [PSAMP-FW]. 178 3.1 Observation Points, Packet Streams and Packet Content 180 * Observation Point 182 An Observation Point is a location in the network where 183 packets can be observed. Examples include: 185 (i) a line to which a probe is attached; 187 (ii) a shared medium, such as an Ethernet-based LAN; 189 (iii) a single port of a router, or set of interfaces 190 (physical or logical) of a router; 192 (iv) an embedded measurement subsystem within an interface. 194 Note that one Observation Point may be a superset of several 195 other Observation Points. For example one Observation Point 196 can be an entire line card. This would be the superset of the 197 individual Observation Points at the line card's interfaces. 199 * Observed Packet Stream 200 The Observed Packet Stream is the set of all packets observed 201 at the Observation Point. 203 * Packet Stream 205 A packet stream denotes a subset of the Observed Packet Stream 206 that flows past some specified point within the measurement 207 process. An example of a Packet Stream is the output of the 208 selection process. 210 * Packet Stream 212 A packet stream denotes a set of packets that flows past some 213 specified point within the measurement process. An example of 214 a Packet Stream is the output of the selection process. 215 Note that packets selected from a stream, e.g. by Sampling, do 216 not necessarily possess a property by which they can be 217 distinguished from packets that have not been selected. For 218 this reason the term "stream" is favored over "flow", which is 219 defined as set of packets with common properties [IPFIX-REQ]. 221 * Packet Content 223 The packet content denotes the union of the packet header 224 (which includes link layer, network layer and other 225 encapsulation headers) and the packet payload. 227 3.2 Selection Process 229 * Selection Process 231 A Selection Process takes the Observed Packet Stream as its 232 input and selects a subset of that stream as its output. 234 * Selection State 236 A Selection Process may maintain state information for use by 237 the Selection Process and/or the reporting process. At a given 238 time, the Selection State may depend on packets observed at 239 and before that time, and other variables. Examples include: 241 (i) sequence numbers of packets at the input of Selectors; 243 (ii) a timestamp of observation of the packet at the 244 Observation Point; 246 (iii) iterators for pseudorandom number generators; 247 (iv) hash values calculated during selection; 249 (v) indicators of whether the packet was selected by a 250 given Selector; 252 Selection Processes may change portions of the Selection State 253 as a result of processing a packet. Selection state for a 254 packet is to reflect the state after processing the packet. 256 * Selector 258 A Selector defines the action of a Selection Process on a 259 single packet of its input. If selected, the packet becomes an 260 element of the output Packet Stream. 262 The Selector can make use of the following information in 263 determining whether a packet is selected: 265 (i) the packet's content; 267 (ii) information derived from the packet's treatment at the 268 Observation Point; 270 (iii) any selection state that may be maintained by the 271 Selection Process. 273 * Composite Selector 275 A Composite Selector is an ordered composition of Selectors, 276 in which the output Packet Stream issuing from one Selector 277 forms the input Packet Stream to the succeeding Selector. 279 * Primitive Selector 281 A Selector is primitive if it is not a Composite Selector. 283 3.3 Reporting Process 285 * Reporting Process: 287 A Reporting Process creates a Report Stream on packets 288 selected by a Selection Process, in preparation for export. 289 The input to the Reporting Process comprises that information 290 available to the Selection Process per selected packet, 291 specifically: 293 (i) the selected packet's content; 294 (ii) information derived from the selected packet's 295 treatment at the Observation Point; 297 (iii) any Selection State maintained by the inputting 298 Selection Process, reflecting any modifications to the 299 Selection State made during selection of the packet. 301 * Packet Reports 303 Packet Reports comprise a configurable subset of a packet's 304 input to the reporting process, including the packet's 305 content, information relating to its treatment (for example, 306 the output interface), and its associated selection state (for 307 example, a hash of the packet's content) 309 * Report Interpretation: 311 Report Interpretation comprises subsidiary information, 312 relating to one or more packets, that is used for 313 interpretation of their packet reports. Examples include 314 configuration parameters of the Selection Process and of the 315 Reporting Process. 317 * Report Stream: 319 The Report Stream is the output of a Reporting Process, 320 comprising two distinguished types of information: Packet 321 Reports, and Report Interpretation. 323 3.4 Measurement Process 325 * Measurement Process 327 A Measurement Process is the composition of a Selection 328 Process that takes the Observed Packet Stream as its input, 329 followed by a Reporting Process. 331 3.5 Exporting Process 333 * Exporting Process: 335 An Exporting Process sends, in the form of Export Packet, the 336 output of one or more Measurement Processes to one or more 337 Collectors. 339 * Export Packet: 341 An Export Packet is a combination of Report Interpretation 342 and/or one or more Packet Reports are bundled by the Exporting 343 Process into a Export Packet for exporting to a Collector. 345 3.6 PSAMP Device 347 * PSAMP Device 349 A PSAMP Device is a device hosting at least an Observation 350 Point, a Measurement Process and an Exporting Process. 351 Typically, corresponding Observation Point(s), Measurement 352 Process(es) and Exporting Process(es) are co-located at this 353 device, for example at a router. 355 3.7 Collector 357 * Collector 359 A Collector receives a Report Stream exported by one or more 360 Exporting Processes. In some cases, the host of the 361 Measurement and/or Exporting Processes may also serve as the 362 Collector. 364 3.8 Selection Methods 366 * Filtering 368 A filter is a Selector that selects a packet deterministically 369 based on the Packet Content, or its treatment, or functions of 370 these occurring in the Selection State. Examples include field 371 match Filtering, and Hash-based Selection. 373 * Sampling 375 A Selector that is not a filter is called a Sampling 376 operation. This reflects the intuitive notion that if the 377 selection of a packet cannot be determined from its content 378 alone, there must be some type of Sampling taking place. 380 * Content-independent Sampling 382 A Sampling operation that does not use Packet Content (or 383 quantities derived from it) as the basis for selection is 384 called a Content-independent Sampling operation. Examples 385 include systematic Sampling, and uniform pseudorandom Sampling 386 driven by a pseudorandom number whose generation is 387 independent of Packet Content. Note that in Content- 388 independent Sampling it is not necessary to access the Packet 389 Content in order to make the selection decision. 391 * Content-dependent Sampling 393 A Sampling operation where selection is dependent on Packet 394 Content is called a Content-dependent Sampling operation. 395 Examples include pseudorandom selection according to a 396 probability that depends on the contents of a packet field. 397 Note that this is not a filter, because the selection is not 398 deterministic. 400 * Hash Domain 402 A subset of the Packet Content and the packet treatment, 403 viewed as an N-bit string for some positive integer N. 405 * Hash Range 407 A set of M-bit strings for some positive integer M that define 408 the range of values the result of the hash operation can take. 410 * Hash Function 412 A deterministic map from the Hash Domain into the Hash Range. 414 * Hash Selection Range 416 A subset of the Hash Range. The packet is selected if the 417 action of the Hash Function on the Hash Domain for the packet 418 yields a result in the Hash Selection Range. 420 * Hash-based Selection 422 Filtering specified by a Hash Domain, a Hash Function, and 423 Hash Range and a Hash Selection Range. 425 * Approximative Selection 427 Selectors in any of the above categories may be approximated 428 by operations in the same or another category for the purposes 429 of implementation. For example, uniform pseudorandom Sampling 430 may be approximated by Hash-based Selection, using a suitable 431 Hash Function and Hash Domain. In this case, the closeness of 432 the approximation depends on the choice of Hash Function and 433 Hash Domain. 435 * Population 437 A Population is a Packet Stream, or a subset of a Packet 438 Stream. A Population can be considered as a base set from 439 which packets are selected. An example is all packets in the 440 Observed Packet Stream that are observed within some specified 441 time interval. 443 * Population Size 445 The Population Size is the number of all packets in the 446 Population. 448 * Sample Size 450 The number of packets selected from the Population by a 451 Selector. 453 * Configured Selection Fraction 455 The Configured Selection Fraction is the ratio of the number 456 of packets selected by a Selector from an input Population, to 457 the Population Size, as based on the configured selection 458 parameters. 460 * Attained Selection Fraction 462 The Attained Selection Fraction is the actual ratio of the 463 number of packets selected by a Selector from an input 464 Population, to the Population Size. 466 For some sampling methods the Attained Selection Fraction can 467 differ from the Configured Selection Fraction due to, for 468 example, the inherent statistical variability in sampling 469 decisions of probabilistic Sampling and Hash-based Selection. 470 Nevertheless, for large Population Sizes and properly configured 471 Selectors, the Attained Selection Fraction usually approaches 472 the Configured Selection Fraction. 474 4. Categorization of Packet Selection Techniques 476 Packet selection techniques generate a subset of packets from an 477 Observed Packet Stream at an Observation Point. We distinguish 478 between Sampling and Filtering. 480 Sampling is targeted at the selection of a representative subset 481 of packets. The subset is used to infer knowledge about the 482 whole set of observed packets without processing them all. The 483 selection can depend on packet position, and/or on packet 484 content, and/or on (pseudo) random decisions. 486 Filtering selects a subset with common properties. This is used 487 if only a subset of packets is of interest. The properties can 488 be directly derived from the packet content, or depend on the 489 treatment given by the router to the packet. Filtering is a 490 deterministic operation. It depends on packet content or router 491 treatment. It never depends on packet position or on (pseudo) 492 random decisions. 494 Note that a common technique to select packets is to compute a 495 Hash Function on some bits of the packet header and/or content 496 and to select it if the Hash Value falls in the Hash Selection 497 Range. Since hashing is a deterministic operation on the packet 498 content, it is a Filtering technique according to our 499 categorization. Nevertheless, Hash Functions are sometimes used 500 to emulate random Sampling. Depending on the chosen input bits, 501 the Hash Function and the Hash Selection Range, this technique 502 can be used to emulate the random selection of packets with a 503 given probability p. It is also a powerful technique to 504 consistently select the same packet subset at multiple 505 Observation Points [DuGr00] 507 The following table gives an overview of the schemes described 508 in this document and their categorization. An X in brackets (X) 509 denotes schemes for which also content-independent variants 510 exist. It easily can be seen that only schemes with both 511 properties, content dependence and deterministic selection, are 512 considered as filters. 514 Selection Scheme | Deterministic | Content- | Category 515 | Selection | dependent| 516 ------------------------+---------------+----------+---------- 517 Systematic | X | _ | Sampling 518 Count-based | | | 519 ------------------------+---------------+----------+---------- 520 Systematic | X | - | Sampling 521 Time-based | | | 522 ------------------------+---------------+----------+---------- 523 Random | - | - | Sampling 524 n-out-of-N | | | 525 ------------------------+---------------+----------+---------- 526 Random | - | - | Sampling 527 Uniform probabilistic | | | 528 ------------------------+---------------+----------+---------- 529 Random | - | (X) | Sampling 530 Non-uniform probabil. | | | 531 ------------------------+---------------+----------+---------- 532 Random | - | (X) | Sampling 533 Non-uniform flow-state | | | 535 ------------------------+---------------+----------+---------- 536 Field match filtering | X | X | Filtering 537 ------------------------+---------------+----------+---------- 538 Hash Function | X | X | Filtering 539 ------------------------+---------------+----------+---------- 540 Router state filtering | X | (X) | Filtering 541 ------------------------+---------------+----------+---------- 543 The categorization just introduced is mainly useful for the 544 definition of an information model describing Primitive 545 Selectors. More complex selection techniques can be described 546 through the composition of cascaded Sampling and Filtering 547 operations. For example, a packet selection that weights the 548 selection probability on the basis of the packet length can be 549 described as a cascade of a Filtering and a Sampling scheme. 550 However, this descriptive approach is not intended to be rigid: 551 if a common and consolidated selection practice turns out to be 552 too complex to be described as a composition of the mentioned 553 building blocks, an ad hoc description can be specified instead 554 and added as a new scheme to the information model. 556 5. Sampling 558 The deployment of Sampling techniques aims at the provisioning 559 of information about a specific characteristic of the parent 560 population at a lower cost than a full census would demand. In 561 order to plan a suitable Sampling strategy it is therefore 562 crucial to determine the needed type of information and the 563 desired degree of accuracy in advance. 565 First of all it is important to know the type of metric that 566 should be estimated. The metric of interest can range from 567 simple packet counts [JePP92] up to the estimation of whole 568 distributions of flow characteristics (e.g. packet 569 sizes)[ClPB93]. 571 Secondly, the required accuracy of the information and with 572 this, the confidence that is aimed at, should be known in 573 advance. For instance for usage-based accounting the required 574 confidence for the estimation of packet counters can depend on 575 the monetary value that corresponds to the transfer of one 576 packet. That means that a higher confidence could be required 577 for expensive packet flows (e.g. premium IP service) than for 578 cheaper flows (e.g. best effort). The accuracy requirements for 579 validating a previously agreed quality can also vary extremely 580 with the customer demands. These requirements are usually 581 determined by the service level agreement (SLA). 583 The Sampling method and the parameters in use must be clearly 584 communicated to all applications that use the measurement data. 585 Only with this knowledge a correct interpretation of the 586 measurement results can be ensured. 588 Sampling methods can be characterized by the Sampling algorithm, 589 the trigger type used for starting a Sampling interval and the 590 length of the Sampling interval. These parameters are described 591 here in detail. The Sampling algorithm describes the basic 592 process for selection of samples. In accordance to [AmCa89] and 593 [ClPB93] we define the following basic Sampling processes: 595 5.1 Systematic Sampling 597 Systematic Sampling describes the process of selecting the start 598 points and the duration of the selection intervals according to 599 a deterministic function. This can be for instance the periodic 600 selection of every k-th element of a trace but also the 601 selection of all packets that arrive at pre-defined points in 602 time. Even if the selection process does not follow a periodic 603 function (e.g. if the time between the Sampling intervals varies 604 over time) we consider this as systematic Sampling as long as 605 the selection is deterministic. 607 The use of systematic Sampling always involves the risk of 608 biasing the results. If the systematics in the Sampling process 609 resemble systematics in the observed stochastic process 610 (occurrence of the characteristic of interest in the network), 611 there is a high probability that the estimation will be biased. 612 Systematics in the observed process might not be known in 613 advance. 615 Here only equally spaced schemes are considered, where triggers 616 for Sampling are periodic, either in time or in packet count. 617 All packets occurring in a selection interval (either in time or 618 packet count) beyond the trigger are selected. 620 Systematic count-based 621 In systematic count-based Sampling the start and stop triggers 622 for the Sampling interval are defined in accordance to the 623 spatial packet position (packet count). 625 Systematic time-based 626 In systematic time-based Sampling time-based start and stop 627 triggers are used to define the Sampling intervals. All packets 628 are selected that arrive at the Observation Point within the 629 time-intervals defined by the start and stop triggers (i.e. 630 arrival time of the packet is larger than the start time and 631 smaller than the stop time). 633 Both schemes are content-independent selection schemes. Content 634 dependent deterministic Selectors are categorized as filter. 636 5.2 Random Sampling 638 Random Sampling selects the starting points of the Sampling 639 intervals in accordance to a random process. The selection of 640 elements are independent experiments. With this, unbiased 641 estimations can be achieved. In contrast to systematic Sampling, 642 random Sampling requires the generation of random numbers. One 643 can differentiate two methods of random Sampling: 645 5.2.1 n-out-of-N Sampling 647 In n-out-of-N Sampling n elements are selected out of the parent 648 population that consists of N elements. One example would be to 649 generate n different random numbers in the range [1,N] and 650 select all packets which have a packet position equal to one of 651 the random numbers. For this kind of Sampling the Sample Size n 652 is fixed. 654 5.2.2 Probabilistic Sampling 656 In probabilistic Sampling the decision whether an element is 657 selected or not is made in accordance to a pre-defined selection 658 probability. An example would be to flip a coin for each packet 659 and select all packets for which the coin showed the head. For 660 this kind of Sampling the Sample Size can vary for different 661 trials. The selection probability does not necessarily has to be 662 the same for each packet. Therefore we distinguish between 663 uniform probabilistic Sampling (with the same selection 664 probability for all packets) and non-uniform probabilistic 665 Sampling (where the selection probability can vary for different 666 packets). 668 5.2.2.1 Uniform Probabilistic Sampling 670 For Uniform Probabilistic Sampling packets are selected 671 independently with a uniform probability p. This Sampling can be 672 count-driven, and is sometimes referred to as geometric random 673 Sampling, since the difference in count between successive 674 selected packets are independent random variables with a 675 geometric distribution of mean 1/p. A time-driven analog, 676 exponential random Sampling, has the time between triggers 677 exponentially distributed. 678 Both geometric and exponential random Sampling are examples of 679 what is known as additive random Sampling, defined as Sampling 680 where the intervals or counts between successive samples are 681 independent identically distributed random variable. 683 5.2.2.2 Non-Uniform Probabilistic Sampling 685 This is a variant of Probabilistic Sampling in which the 686 Sampling probabilities can depend on the selection process 687 input. This can be used to weight Sampling probabilities in 688 order e.g. to boost the chance of Sampling packets that are rare 689 but are deemed important. Unbiased estimators for quantitative 690 statistics are recovered by renormalization of sample values; 691 see [HT52]. 693 5.2.2.3 Non-Uniform Flow State Dependent Sampling 695 Another type of Sampling that can be classified as probabilistic 696 Non-Uniform is closely related to the flow concept as defined 697 in [IPFIX-REQ], and it is only used jointly with a flow 698 monitoring function (IPFIX metering process). Packets are 699 selected, dependent on a selection state. The point, here, is 700 that the selection state is determined also by the state of the 701 flow the packet belongs to and/or by the state of the other 702 flows currently being monitored by the associated flow 703 monitoring function. An example for such an algorithm is the 704 "sample and hold" method described in [EsVa01]: 706 - If a packet accounts for a flow record that already exists in 707 the IPFIX flow recording process, it is selected (i.e. the 708 flow record is updated) 709 - If a packet doesn't account to any existing flow record, it is 710 selected with probability p. If it has been selected a new 711 flow record has to be created. 713 A further algorithm that fits into the category of non-uniform 714 flow state dependent Sampling is described in [Moli03]. 716 This type of Sampling is content dependent because the 717 identification of the flow the packet belongs to requires 718 analyzing part of the packet content. If the packet is selected, 719 then it is passed as an input to the IPFIX monitoring function 720 (this is called "Local Export" in [PSAMP-FW] 721 Selecting the packet depending on the state of a flow cache is 722 useful when memory resources of the flow monitoring function are 723 scarce (i.e. there is no room to keep all the flows that have 724 been scheduled for monitoring). See [MolFl03] for a more 725 detailed description of the motivations for this type of 726 Sampling and the impact on the IPFIX metering. 728 5.2.2.4 Configuration of non-uniform probabilistic and flow-state 729 Sampling 731 Many different specific methods can be grouped under the terms 732 non-uniform probabilistic and flow state Sampling. Dependent on 733 the Sampling goal and the implemented scheme, a different number 734 and type of input parameters is required to configure such 735 scheme. 737 Some concrete proposals for such methods exist from the research 738 community (e.g. [EsVa01],[DuLT01],[Moli03]). Some of these 739 proposals are still in an early stage and need further 740 investigations to prove their usefulness and applicability. It 741 is not our aim to indicate preference amongst these methods. 742 Instead, we only describe here the basic methods and leave the 743 specification of explicit schemes and their parameters up to 744 vendors (e.g. as extension of the information model). 746 6. Filtering 748 Filtering is the deterministic selection of packets based on the 749 packet content, the treatment of the packet at the Observation 750 Point, or deterministic functions of these occurring in the 751 selection state. The packet is selected if these quantities fall 752 into a specified range. 753 The role of Filtering, as the word itself suggest, is to 754 separate all the packets having a certain property from those 755 not having it. A distinguishing characteristic from Sampling is 756 that the selection decision does not depend on the packet 757 position in time or in the space, or on a random process. 758 We identify and describe in the following three Filtering 759 techniques. The first two (Match Filtering and Hashing 760 Filtering) are stateless, and therefore can make their decision 761 based on the analysis of portion of the packet only. The other 762 (router state Filtering) requires to access state information 763 after the analysis of part of the packet and is therefore more 764 complex: its usage makes sense only in particular circumstances, 765 as described below. 767 6.1 Field Match Filtering 769 We here define a basic Filtering schemes based on the IPIFIX 770 flow definition. With this method a packet is selected if a 771 specific field in the packet equals a predefined value. Possible 772 filter fields are all IPFIX flow attributes specified in [IPFIX- 773 INFO]. Further fields can be defined by vendor specific 774 extensions. 775 A packet is selected if Field=Value. Masks and ranges are only 776 supported to the extend to which [IPFIX-INFO] allows them e.g. 778 by providing explicit fields like the netmasks for source and 779 destination addresses. AND operations are possible by 780 concatenating filters. OR operations are not supported with this 781 basic model. More sophisticated filters (e.g. supporting 782 bitmasks, ranges or OR operations etc.) can be realized as 783 vendor specific schemes. 785 6.2 Hash-based Filtering 787 A Hash Function h maps the packet content c, or some portion of 788 it, onto a Hash Range R. The packet is selected if h(c) is an 789 element of S, which is a subset of R called the Hash Selection 790 Range. Thus Hash-based Selection is indeed a particular case of 791 Filtering: the object is selected if c is in inv(h(S)). But for 792 desirable Hash Functions the inverse image inv(h(S)) will be 793 extremely complex, and hence h would not be expressible as, say, 794 a field match filter or a simple combination of these. 795 Hash-based selection has mainly two types of usage: it offers a 796 way to approximate random Sampling by using packet content to 797 generate pseudorandom variates, and a way to consistently select 798 subsets of packets that share a common property (e.g. at 799 different Observation Points). 801 In the following subsections we give more details about them. 802 However, both usages require that the Hash Functions has two 803 statistical properties. 805 First, the Hash Function h must have good mixing properties, in 806 the sense that small changes in the input (e.g. the flipping of 807 a single bit) cause large changes in the output (many bits 808 change). Then any local clump of values of c is spread widely 809 over R by h, and so the distribution of h(c) is fairly uniform 810 even if the distribution of c is not. Then the Sampling Fraction 811 is #S/#R, which can be tuned by choice of S. 813 The second desirable property depends more closely on the 814 statistics of the content c. In applications, the content c 815 comprises a number of distinct fields, c1 ... cm, e.g. source 816 and destination IP Address, IP identification, and TCP/UDP port 817 numbers (if present) for a packet. With a Hash Function 818 satisfying the first properties above, selection decisions will 819 appear uncorrelated with the contents of any individual field, 820 if the complementary fields are (i) sufficiently variable 821 themselves, and (ii) sufficiently uncorrelated with cj. 823 6.2.1 Application Examples for Hash-based Selection 825 6.2.1.1 Approximation of Random Sampling 826 Although pseudorandom number generators with well understood 827 properties have been developed, they may not be the method of 828 choice in settings where computational resources are scarce. A 829 convenient alternative is to use Hash Functions of packet 830 content as a source of randomness. The hash (suitably 831 renormalized) is a pseudorandom variate in the interval [0,1]. 832 Other schemes may use packet fields in iterators for 833 pseudorandom numbers. However, the statistical properties of an 834 ideal packet selection law (such as independent Sampling for 835 different packets, or independence on packet content) may not be 836 exactly rendered by an implementation, but only approximately 837 so. 839 Use of packet content to generate pseudorandom variates shares 840 with Non-uniform Probabilistic Sampling (see Section 3.1.2.2.2 841 above) the property that selection decisions depend on Packet 842 Content. However, there is a fundamental difference between the 843 two. In the former case the content determines pseudorandom 844 variates. In the latter case the content only determines the 845 selection probabilities: selection could then proceed e.g by use 846 of random variates obtained by an independent pseudorandom 847 number generator. 849 6.2.1.2 Trajectory Sampling and Consistent packet selection. 851 Trajectory Sampling is the consistent selection of a subset of 852 packets at either all of a set of Observation Points or none of 853 them. Trajectory Sampling is realized by Hash-based Selection if 854 all Observation Points in the set use a common Hash Function, 855 Hash Domain and selection range. The Hash Domain comprises all 856 or part of the packet content that is invariant along the packet 857 path. Fields such as Time-to-Live, which is decremented per hop, 858 and header CRC, which is recalculated per hop, are thus excluded 859 from the Hash Domain. The Hash Domain needs to be wider than 860 just a flow key, if packets are to be selected quasirandomly 861 within flows. 863 The trajectory (or path) followed by a packet is reconstructed 864 from PSAMP reports on it that reach a Collector. Reports on a 865 given packet originating from different observations points are 866 associated by matching a label from the reports. The label may 867 comprise that portion invariant packet content that is reported, 868 or possibly some digest of the invariant packet content that is 869 inserted into the packet report at the Observation Point. Such a 870 digest may be constructed by applying a second Hash Function 871 (distinct from that used for selection) to the invariant packet 872 content. The reconstruction of trajectories, and methods for 873 dealing with possible ambiguities due to label collisions 874 (identical labels reported for different packets) and potential 875 loss of reports in transmission, are dealt with in [DuGr01], 876 [DuGeGr02] and [DuGr04]. 878 Applications of trajectory Sampling include (i) estimation of 879 the network path matrix, i.e., the traffic intensities according 880 to network path, broken down by flow key; (ii) detection of 881 routing loops, as indicated by self-intersecting trajectories; 882 (iii) passive performance measurement: prematurely terminating 883 trajectories indicate packet loss, packet one way delay can be 884 determined if reports include (synchronized) timestamps of 885 packet arrival at the Observation Point; (iv) network attack 886 tracing, of the actual paths taken by attack packets with 887 spoofed source addresses. 889 6.2.2 Guarding Against Pitfalls and Vulnerabilities 891 A concern for Hash-based Selection is whether some large set of 892 related packets could have an Attained Sampling Fraction 893 significantly different from the Configured Sampling Fraction, 894 either (i) through unanticipated behavior in the Hash Function, 895 or (ii) because the packets had been deliberately crafted to 896 have this property. 898 The first point underlines the importance of using a Hash 899 Function with good mixing properties. Examples of such are CRC32 900 and Hash Functions based on modular arithmetic, see 6.4 in 901 [Knuth98]. The statistical properties of candidate Hash 902 Functions need to be evaluated, preferably on packet traces 903 before adoption for hash-based Sampling 905 Hash-based selection could be overloaded or evaded by an 906 attacker if the Hash Function and the selection range are both 907 known. A service provider could build a first defense keeping 908 the Hash Selection Range S private. Then, an attacker could not 909 determine whether a crafted packet is selected, but would still 910 know that a crafted set of packets all with the same hash is 911 either all selected or all not selected. A stronger defense is 912 to employ a parametrizable Hash Function and keep the parameter 913 private. Without knowledge of the parameter, a set of packets 914 with common hash value cannot be constructed. Examples of 915 parameters are the initial vector in CRC32, and moduli in hashes 916 based on modular arithmetic. 918 6.2.3 Recommended Hash Functions 920 We here indicate some Hash Functions that can be used for packet 921 selection. The description of the IPSX and BOB Hash Functions 922 can be found in the appendix. The CRC-32 function is described 923 in [crc32]. The comparison of hash-functions with regard to 924 collision probability, the randomness of the packet selection 925 (i.e. the uniformity of the distribution of values in the Hash 926 Range) and the speed of the functions is described in [MoND05]. 927 Although the uniformity has been checked for different traffic 928 traces, results cannot be generalized to arbitrary traffic. 929 Since the hash-based selection is a deterministic function on 930 the packet content, it can always be biased towards packets with 931 specific attributes. Furthermore, please note that all Hash 932 Functions were evaluated only for IPv4. 934 6.2.3.1 Hash Functions Suitable for Packet Selection 936 For hash-based packet selection, the most important requirements 937 for the Hash Function are high execution speed, because the 938 selection must operate at line rate, and the uniformity of the 939 distribution of values in the Hash Range in order to provide a 940 good emulation of a random packet selection. 942 The IPSX function is simple and easy to implement. The 943 evaluation results showed that the IPSX function is faster than 944 the BOB function. Nevertheless, IPSX is limited to 16 bytes 945 input. All investigated Hash Functions showed a poor uniformity 946 if only 16 bytes are used as input. Therefore, if a hash-based 947 packet selection is implemented, the BOB function SHOULD be used 948 for packet selection operations. Other functions, like IPSX, MAY 949 be used. 951 Input bytes for the Hash Function need to be invariant along the 952 path the packet is traveling. Only with this it is ensured that 953 the same packets are selected at different observation points. 954 Furthermore they should have a high variability between 955 different packets to generate a high variation in the Hash 956 Range. 958 6.2.3.2 Input Bytes for the BOB Hash Function for IPv4 960 All investigated Hash Functions showed a poor uniformity if only 961 16 bytes are used as input. If the input consists of 20 bytes 962 which include parts of the IP packet payload (i.e. everything 963 following the IP header) the uniformity was significantly 964 improved. If a hash-based selection with the BOB function is 965 used with IPv4 traffic, the following input bytes MUST be used. 967 - IP identification field 968 - Flags field 969 - Fragment offset 970 - Source IP address 971 - Destination IP address 972 - A configurable number of bytes from the IP payload, starting 973 at a configurable offset. 975 Please note that the uniformity increases with the number of 976 additional input bytes from the IP payload. The default setting 977 is to use 4 bytes from the IP payload. Using less bytes can lead 978 to a significant bias in the selection. 980 6.2.3.3 Input Bytes for the BOB Hash Function for IPv6 982 All investigated Hash Functions were evaluated only for IPv4. 983 Due to the IPv6 header fields and address structure it is 984 expected that there is less randomness in IPv6 packet headers 985 than in IPv4 headers. Nevertheless, the randomness of IPv6 986 traffic was not evaluated in the tests mentioned above. In 987 addition to this, IPv6 traffic profiles may change significantly 988 in future when IPv6 is used by a broader community. 989 If a hash-based selection with the BOB function is used with 990 IPv6 traffic, the following input bytes MUST be used. 992 - Payload length (2 bytes) 993 - Byte number 10,11,14,15,16 of the IPv6 source address 994 - Byte number 10,11,14,15,16 of the IPv6 destination address 995 - A configurable number of bytes from the IP payload, starting 996 at a configurable offset. It is recommended to use at least 4 997 bytes from the IP payload. 999 The payload itself is not changing during the path. Even if some 1000 routers process some extension headers they are not going to 1001 strip them from the packet. Therefore the payload length is 1002 invariant along the path. Furthermore it usually differs for 1003 different packets. 1004 The IPv6 address has 16 bytes. The first part is the network 1005 part and it contains low variation. The second part is the host 1006 part and contains higher variation. Therefore the second part of 1007 the address is used. Nevertheless, the uniformity has not been 1008 checked for IPv6 traffic. It is possible that the selection is 1009 biased towards packets with specific attributes. 1011 6.2.3.4 Hash Functions Suitable for Packet Digesting 1013 For digesting Packet Content for inclusion in a reported label, 1014 the most important property is a low collision frequency. A 1015 secondary requirement is the ability to accept variable length 1016 input, in order to allow inclusion of maximal amount of packet 1017 as input. Execution speed is of secondary importance, since the 1018 digest need only be formed from selected packets. 1020 For this purpose also the BOB function is recommended. Other 1021 functions (such as CRC-32) MAY be used. Among the functions 1022 capable of operating with variable length input BOB and CRC-32 1023 have the fastest execution, BOB being slightly faster. IPSX is 1024 not recommended for digesting because it has a significantly 1025 higher collision rate and takes only a fixed length input. 1027 6.3 Router State Filtering 1029 This class of filters selects a packet on the basis of router 1030 state conditions. The following list gives examples for such 1031 conditions. Conditions can be combined with AND operators. 1033 - Ingress interface at which the packet arrives equals a 1034 specified value 1035 - Egress interface to which the packet is routed equals a 1036 specified value 1037 - Packet violated Access Control List (ACL) on the router 1038 - Reverse Path Forwarding (RPF) failed for the packet 1039 - Resource Reservation is insufficient for the packet 1040 - No route found for the packet 1041 - Origin BGP AS [RFC1771] equals a specified value or lies 1042 within a given range 1043 - Destination BGP AS equals a specified value or lies within 1044 a given range 1046 Router architectural considerations may preclude some 1047 information concerning the packet treatment, e.g. routing state, 1048 being available at line rate for selection of packets. However, 1049 if selection not based on routing state has reduced down from 1050 line rate, subselection based on routing state may be feasible. 1052 7. Parameters for the Description of Selection Techniques 1054 This section gives an overview of different alternative 1055 selection schemes and their required parameters. In order to be 1056 compliant with PSAMP at least one of proposed schemes MUST be 1057 implemented. 1059 The decision whether to select a packet or not is based on a 1060 function which is performed when the packet arrives at the 1061 selection process. Packet selection schemes differ in the input 1062 parameters for the selection process and the functions they 1063 require to do the packet selection. The following table gives an 1064 overview. 1066 Scheme | input parameters | functions 1067 ---------------+------------------------+------------------- 1068 systematic | packet position | packet counter 1069 count-based | Sampling pattern | 1070 ---------------+------------------------+------------------- 1071 systematic | arrival time | clock or timer 1072 time-based | Sampling pattern | 1073 ---------------+------------------------+------------------- 1074 random | packet position | packet counter, 1075 n-out-of-N | Sampling pattern | random numbers 1076 | (random number list) | 1077 ---------------+------------------------+------------------- 1078 uniform | Sampling | random function 1079 probabilistic | probability | 1080 ---------------+------------------------+------------------- 1081 non-uniform |e.g. packet position, | selection function, 1082 probabilistic | packet content(parts) | probability calc. 1083 ---------------+------------------------+------------------- 1084 non-uniform |e.g. flow state, | selection function, 1085 flow-state | packet content(parts) | probability calc. 1086 ---------------+------------------------+------------------- 1087 field match | packet content(parts) | filter function 1088 ---------------+------------------------+------------------- 1089 hash-based | packet content(parts) | Hash Function 1090 ---------------+------------------------+------------------- 1091 router state | router state | router state 1092 | | discovery 1093 ---------------+------------------------+------------------- 1095 7.1 Description of Sampling Techniques 1097 In this section we define what elements are needed to describe 1098 the most common Sampling techniques. Here the selection function 1099 is pre-defined and given by the Selector ID. 1101 Sampler Description: 1102 SELECTOR_ID 1103 SELECTOR_TYPE 1104 SELECTOR_PARAMETERS 1105 ASSOCIATIONS 1107 Where: 1109 SELECTOR_ID: 1110 Unique ID for the packet sampler, calculated as combination of 1111 the ASSOCIATIONS and a local ID. 1113 SELECTOR_TYPE 1114 For Sampling processes the SELECTOR TYPE defines what Sampling 1115 algorithm is used. 1117 Values: Systematic Count-based | Systematic Time-based | Random 1118 n-out-of-N | Uniform Probabilistic | Non-uniform Probabilistic | 1119 Non-uniform Flow-state 1121 SELECTOR_PARAMETERS 1122 For Sampling processes the SELECTOR PARAMETERS define the input 1123 parameters for the process. Interval length in systematic 1124 Sampling means, that all packets that arrive in this interval 1125 are selected. The spacing parameter defines the spacing in time 1126 or number of packets between the end of one Sampling interval 1127 and the start of the next succeeding interval. 1129 Case n out of N: 1130 - Population size N, Sample size n 1132 Case Systematic Time Based: 1133 - Interval length (in usec), Spacing (in usec) 1135 Case Systematic Count Based: 1136 - Interval length(in packets), Spacing (in packets) 1138 Case Uniform Probabilistic (with equal probability per packet): 1139 - Sampling probability p 1141 Case Non-uniform Probabilistic: 1142 - Calculation function for Sampling probability p (see also 1143 section 5.2.2.4) 1145 Case flow state: 1146 - Information reported for flow state can be found in 1147 [MolFl03](see also section 5.2.2.4) 1149 ASSOCIATIONS 1150 The ASSOCIATIONS field describes the Observation Point and 1151 optionally the IPFIX processes to which the packet Selector is 1152 associated. If no IPFIX process is specified, the selector is 1153 applied to all IPFIX processes on the observation point. The 1154 STREAM ID denotes the origin of the data stream that is input to 1155 the selection function. It can be the Observation Point directly 1156 or the ID of another Selector. With this it is possible to 1157 define combined schemes. If the STREAM ID contains IDs from 1158 other Selectors, one can derive the original Observation Point 1159 from the Selector definitions of these specified Selectors. 1161 Values: 1163 With STREAM ID: Observation point ID AND List of SELECTOR_IDs 1165 7.2 Description of Filtering Techniques 1167 In this section we define what elements are needed to describe 1168 the most common Filtering techniques. The structure closely 1169 parallels the one presented for the Sampling techniques. 1171 Filter Description: 1172 SELECTOR_ID 1173 SELECTOR_TYPE 1174 SELECTOR_PARAMETERS 1175 ASSOCIATIONS 1177 Where: 1179 SELECTOR_ID: 1180 Unique ID for the packet filter. The ID can be calculated under 1181 consideration of the ASSOCIATIONS and a local ID. 1183 SELECTOR_TYPE 1184 For Filtering processes the SELECTOR TYPE defines what Filtering 1185 type is used. 1186 Values: Matching | Hashing | Router_state 1188 SELECTOR_PARAMETERS 1189 For Filtering processes the SELECTOR PARAMETERS define formally 1190 the common property of the packet being filtered. For the 1191 filters of type Matching and Hashing the definitions have a lot 1192 of points in common. 1194 Values: 1196 Case Matching 1197 - Information Element (from [IPFIX-INFO]) 1198 - Value (type in accordance to [IPFIX-INFO]) 1200 In case of multiple match criteria, multiple "case matching" 1201 have to be bound by a logical AND. 1203 Case Hashing: 1204 - Hash Domain (Input bits from packet) 1205 -
1206 - 1207 -
1208 - 1209 - 1210 - 1211 - Hash Function 1212 - Hash function name 1213 - Length of input key (eliminate 0x bytes) 1214 - Output value (length M and bitmask) 1215 - Hash Selection Range, as a list of non overlapping 1216 intervals [start value, end value] where value is in 1217 [0,2^M-1] 1218 - Additional parameters dependent on specific Hash 1219 Function (e.g. hash input bits (seed)) 1221 Notes to input bits for Case Hashing: 1222 - Input bits can be from header part only, from the payload 1223 part only or from both. 1224 - The bit specification, for the header part, can be 1225 specified for ipv4 or ipv6 only, or both 1226 - In case of ipv4, the bit specification is a sequence of 20 1227 Hexadecimal numbers [00,FF] specifying a 20 bytes bitmask 1228 to be applied to the header. 1229 - In case of ipv6, it is a sequence of 40 Hexadecimal numbers 1230 [00,FF] specifying a 40 bytes bitmask to be applied to the 1231 header 1232 - The bit specification, for the payload part, is a sequence 1233 of Hexadecimal numbers [00,FF] specifying the bitmask to be 1234 applied to the first N bytes of the payload, as specified 1235 by the previous field. In case the Hexadecimal number 1236 sequence is longer then N, only the first N numbers are 1237 considered. 1238 - In case the payload is shorter than N, the Hash Function 1239 cannot be applied. Other options, like padding with zeros, 1240 may be considered in the future. 1241 - A Hash Function cannot be defined on the options field of 1242 the ipv4 header, neither on stacked headers of ipv6. 1243 - The Hash Selection Range defines a range of hash-values 1244 (out of all possible results of the Hash-Operation). If the 1245 hash result for a specific packet falls in this range, the 1246 packet is selected. If the value is outside the range, the 1247 packet is not selected. E.g. if the selection interval 1248 specification is [1:3], [6:9] all packets are selected for 1249 which the hash result is 1,2,3,6,7,8, or 9. In all other 1250 cases the packet is not selected. 1252 Case Router State: 1254 - Ingress interface at which the packet arrives equals a 1255 specified value 1256 - Egress interface to which the packet is routed equals a 1257 specified value 1258 - Packet violated Access Control List (ACL) on the router 1259 - Reverse Path Forwarding (RPF) failed for the packet 1260 - Resource Reservation is insufficient for the packet 1261 - No route found for the packet 1262 - Origin AS equals a specified value or lies within a given 1263 range 1264 - Destination AS equals a specified value or lies within a 1265 given range 1267 Note to Case Router State: 1268 - All Router state entries can be linked by AND operators 1270 ASSOCIATIONS 1271 The ASSOCIATIONS field describes the Observation Point and 1272 optionally the IPFIX processes to which the packet Selector is 1273 associated. If no IPFIX process is specified, the Selector is 1274 applied to all IPFIX processes on the observation point. The 1275 STREAM ID denotes the origin of the data stream that is input to 1276 the selection function. It can be the Observation Point directly 1277 or the ID of another Selector. With this it is possible to 1278 define combined schemes. If the STREAM ID contains IDs from 1279 other Selectors, one can derive the original Observation Point 1280 from the Selector definitions of these specified Selectors. 1282 Values: 1284 With STREAM ID: Observation point ID AND List of SELECTOR_IDs 1286 8. Composite Techniques 1288 Composite schemes are realized by using the STREAM ID in the 1289 information models. The STREAM ID denotes from which Selectors 1290 the input stream originates. If multiple stream IDs are given, 1291 this means that the Selector operates on the packet stream 1292 simply resulting from the time superposition of the output of 1293 all the listed filters and samplers. Some examples of composite 1294 schemes are reported below. 1296 8.1 Cascaded Filtering->Sampling or Sampling->Filtering 1298 If a filter precedes a Sampling process the role of Filtering is 1299 to create a set of "parent populations" from a single stream 1300 that can then be fed independently to different Sampling 1301 functions, with different parameters tuned for the population 1302 itself (e.g. if streams of different intensity result from 1303 Filtering, it may be good to have different Sampling rates). If 1304 Filtering follows a Sampling process, the same Sampling Fraction 1305 and type is applied to the whole stream, independently of the 1306 relative size of the streams resulting from the Filtering 1307 function. Moreover, also packets not destined to be selected in 1308 the Filtering operation will "load" the Sampling function. So, 1309 in principle, Filtering before Sampling allows a more accurate 1310 tuning of the Sampling procedure, but if filters are too complex 1311 to work at full line rate (e.g. because they have to access 1312 router state information), Sampling before Filtering may be a 1313 need. 1315 8.2 Stratified Sampling 1317 Stratified Sampling is one example for using a composite 1318 technique. The basic idea behind stratified Sampling is to 1319 increase the estimation accuracy by using a-priori information 1320 about correlations of the investigated characteristic with some 1321 other characteristic that is easier to obtain. The a-priori 1322 information is used to perform an intelligent grouping of the 1323 elements of the parent population. With this a higher estimation 1324 accuracy can be achieved with the same Sample Size or the Sample 1325 Size can be reduced without reducing the estimation accuracy. 1327 Stratified Sampling divides the Sampling process into multiple 1328 steps. First, the elements of the parent population are grouped 1329 into subsets in accordance to a given characteristic. This 1330 grouping can be done in multiple steps. Then samples are taken 1331 from each subset. 1333 The stronger the correlation between the characteristic used to 1334 divide the parent population (stratification variable) and the 1335 characteristic of interest (for which an estimate is sought 1336 after), the easier is the consecutive Sampling process and the 1337 higher is the stratification gain. For instance if the dividing 1338 characteristic were equal to the investigated characteristic, 1339 each element of the sub-group would be a perfect representative 1340 of that characteristic. In this case it would be sufficient to 1341 take one arbitrary element out of each subgroup to get the 1342 actual distribution of the characteristic in the parent 1343 population. Therefore stratified Sampling can reduce the costs 1344 for the Sampling process (i.e. the number of samples needed to 1345 achieve a given level of confidence). 1347 For stratified Sampling one has to specify classification rules 1348 for grouping the elements into subgroups and the Sampling scheme 1349 that is used within the subgroups. The classification rules can 1350 be expressed by multiple filters. For the Sampling scheme within 1351 the subgroups the parameters have to be specified as described 1352 above. The use of stratified Sampling methods for measurement 1353 purposes is described for instance in [ClPB93] and [Zseb03]. 1355 9. Security Considerations 1357 Malicious users or attackers may wish to hide packets from 1358 service providers or network operators. For instance if packet 1359 Selectors are used for accounting or intrusion detection 1360 applications, users may want to prevent certain packets from 1361 being selected. If a deterministic Sampling scheme is used or a 1362 selection scheme that takes packet content into account, the 1363 user can shape or send packets in a way that they are less 1364 likely to be selected (see also section 6.2.2). Even if the 1365 selection function is unknown to the user, some insight into the 1366 function can be obtained by performing experiments with 1367 different packet sequences. This has to be taken into account 1368 when choosing an appropriate packet selection technique. 1370 Further security threats can occur if the configuration of 1371 Sampling parameters or the communication of Sampling parameters 1372 to the application is corrupted. This document only describes 1373 Sampling schemes that can be used for packet selection. It 1374 neither describes a mechanism how those parameters are 1375 configured nor how these parameters are communicated to the 1376 application. Therefore the security threats that originate from 1377 this kind of communication cannot be assessed with the 1378 information given in this document. 1380 10. Acknowledgements 1382 We would like to thank the PSAMP group, especially Benoit Claise 1383 and Stewart Bryant, for fruitful discussions and for 1384 proofreading the document. 1386 11. Normative References 1388 [PSAMP-FW] Nick Duffield (Ed.): A Framework for Packet 1389 Selection and Reporting, RFC XXXX [currently 1390 Internet Draft draft-ietf-psamp-framework-08, work 1391 in progress, October 2004] 1393 [PSAMP-MIB] T. Dietz, B. Claise: Definitions of Managed Objects 1394 for Packet Sampling, RFC XXXX. [Currently Internet 1395 Draft, draft-ietf-psamp-mib-03.txt, work in 1396 progress, July 2004.] 1398 [PSAMP-PROTO] B. Claise (Ed.): Packet Sampling (PSAMP) Protocol 1399 Specifications, RFC XXXX. [Currently Internet Draft 1400 draft-ietf-psamp-protocol-01.txt, work in progress, 1401 February 2004.] 1403 [PSAMP-INFO] T. Dietz, F. Dressler, G. Carle, B. Claise: 1404 Information Model for Packet Sampling Exports, RFC 1405 XXXX. [Currently Internet Draft, draft-ietf-psamp- 1406 info-02, July 2004] 1408 [IPFIX-INFO] J. Meyer, J. Quittek, S. Bryant: Information Model 1409 for IP Flow Information Export, RFC XXXX [Currently 1410 Internet Draft, draft-ietf-ipfix-info-04, July 1411 2004] 1413 [IPFIX-REQ] J. Quittek, T. Zseby, B. Claise, S. Zander: 1414 Requirements for IP Flow Information Export, RFC 1415 3917, October 2004 1417 12. Informative References 1419 [AmCa89] Paul D. Amer, Lillian N. Cassel: Management of 1420 Sampled Real-Time Network Measurements, 14th 1421 Conference on Local Computer Networks, October 1422 1989, Minneapolis, pages 62-68, IEEE, 1989 1424 [ClPB93] K.C. Claffy, George C. Polyzos, Hans-Werner Braun: 1425 Application of Sampling Methodologies to Network 1426 Traffic Characterization, Proceedings of ACM 1427 SIGCOMM'93, San Francisco, CA, USA, September 13 - 1428 17, 1993 1430 [CoGi98] I. Cozzani, S. Giordano: Traffic Sampling Methods 1431 for end-to-end QoS Evaluation in Large 1432 Heterogeneous Networks. Computer Networks and ISDN 1433 Systems, 30 (16-18), September 1998. 1435 [crc32] R. Braden, D. Borman, C. Partridge: Computing the 1436 Internet Checksum, RFC 1071, Sep. 1988 (updated by 1437 RFCs 1141 and 1624) 1439 [DuGeGr02] N.G. Duffield, A. Gerber, M. Grossglauser: 1440 Trajectory Engine: A Backend for Trajectory 1441 Sampling, IEEE Network Operations and Management 1442 Symposium 2002, Florence, Italy, April 15-19, 2002. 1444 [DuGr00] N.G. Duffield, M. Grossglauser: Trajectory Sampling 1445 for Direct Traffic Observation, Proceedings of ACM 1446 SIGCOMM 2000, Stockholm, Sweden, August 28 - 1447 September 1, 2000. 1449 [DuGr04] N. G. Duffield and M. Grossglauser: Trajectory 1450 Sampling with Unreliable Reporting, Proc IEEE 1451 Infocom 2004, Hong Kong, March 2004, 1453 [DuLT01] N.G. Duffield, C. Lund, and M. Thorup: Charging 1454 from Sampled Network Usage, ACM Internet 1455 Measurement Workshop IMW 2001, San Francisco, USA, 1456 November 1-2, 2001 1458 [EsVa01] C. Estan and G. Varghese: New Directions in Traffic 1459 Measurement and Accounting, ACM SIGCOMM Internet 1460 Measurement Workshop 2001, San Francisco (CA) Nov. 1461 2001 1463 [HT52] D.G. Horvitz and D.J. Thompson: A Generalization of 1464 Sampling without replacement from a Finite 1465 Universe. J. Amer. Statist. Assoc. Vol. 47, pp. 1466 663-685, 1952. 1468 [Jenk97] B. Jenkins: Algorithm Alley, Dr. Dobb's Journal, 1469 September 1997. 1470 http://burtleburtle.net/bob/hash/doobs.html 1472 [JePP92] Jonathan Jedwab, Peter Phaal, Bob Pinna: Traffic 1473 Estimation for the Largest Sources on a Network, 1474 Using Packet Sampling with Limited Storage, HP 1475 technical report, Managemenr, Mathematics and 1476 Security Department, HP Laboratories, Bristol, 1477 March 1992, 1478 http://www.hpl.hp.com/techreports/92/HPL-92-35.html 1480 [Knuth98] Donald E. Knuth: The Art of Computer Programming, 1481 Volume 3: Searching and Sorting, Addison Wesley, 1482 1998 1484 [MolFl03] M.Molina: Flow selection support in IPFIX, Internet 1485 Draft , work in 1486 progress, October 2003. 1488 [Moli03] M.Molina: A scalable and efficient methodology for 1489 flow monitoring in the internet, International 1490 Teletraffic Congress (ITC-18), Berlin, Sep. 2003 1492 [MoND05] M. Molina, S.Niccolini, N.G.Duffield: A Comparative 1493 Experimental Study of Hash Functions Applied to 1494 Packet Sampling. International Teletraffic Congress 1495 (ITC-19), Beijing, August 2005 1497 [RFC1771] Rekhter, Y. and T. Li: A Border Gateway Protocol 4 1498 (BGP-4), RFC 1771, March 1995. 1500 [Zseb03] T. Zseby: Stratification Strategies for Sampling- 1501 based Non-intrusive Measurement of One-way Delay. 1502 Proceedings of Passive and Active Measurement 1503 Workshop (PAM 20003), La Jolla, CA, USA, pp. 171- 1504 179, April 2003 1506 13. Authors' Addresses 1508 Tanja Zseby 1509 Fraunhofer Institute for Open Communication Systems 1510 Kaiserin-Augusta-Allee 31 1511 10589 Berlin 1512 Germany 1513 Phone: +49-30-34 63 7153 1514 Email: zseby@fokus.fhg.de 1516 Maurizio Molina 1517 DANTE 1518 City House 1519 126-130 Hills Road 1520 Cambridge CB21PQ 1521 United Kingdom 1522 Phone: +44 1223 371 300 1523 Email: maurizio.molina@dante.org.uk 1525 Nick Duffield 1526 AT&T Labs - Research 1527 Room B-139 1528 180 Park Ave 1529 Florham Park NJ 07932, USA 1530 Phone: +1 973-360-8726 1531 Email: duffield@research.att.com 1533 Saverio Niccolini 1534 Network Laboratories, NEC Europe Ltd. 1535 Kurfuerstenanlage 36 1536 69115 Heidelberg 1537 Germany 1538 Phone: +49-6221-9051118 1539 Email: saverio.niccolini@netlab.nec.de 1541 Fredric Raspall 1542 EPSC-UPC 1543 Dept. of Telematics 1544 Av. del Canal Olimpic, s/n 1545 Edifici C4 1546 E-08860 Castelldefels, Barcelona 1547 Spain 1548 Email: fredi@entel.upc.es 1550 14. Intellectual Property Statement 1552 The IETF has been notified by AT&T Corp. of intellectual 1553 property rights claimed in regard to some or all of the 1554 specification contained in this document. For more information, 1555 see http://www.ietf.org/ietf/IPR/att-ipr-draft-ietf-psamp- 1556 framework.txt 1558 The IETF takes no position regarding the validity or scope of 1559 any Intellectual Property Rights or other rights that might be 1560 claimed to pertain to the implementation or use of the 1561 technology described in this document or the extent to which any 1562 license under such rights might or might not be available; nor 1563 does it represent that it has made any independent effort to 1564 identify any such rights. Information on the procedures with 1565 respect to rights in RFC documents can be found in BCP 78 and 1566 BCP 79. 1568 Copies of IPR disclosures made to the IETF Secretariat and any 1569 assurances of licenses to be made available, or the result of an 1570 attempt made to obtain a general license or permission for the 1571 use of such proprietary rights by implementers or users of this 1572 specification can be obtained from the IETF on-line IPR 1573 repository at http://www.ietf.org/ipr. 1575 The IETF invites any interested party to bring to its attention 1576 any copyrights, patents or patent applications, or other 1577 proprietary rights that may cover technology that may be 1578 required to implement this standard. Please address the 1579 information to the IETF at ietf-ipr@ietf.org. 1581 15. Copyright Statement 1583 Copyright (C) The Internet Society (2005). This document is 1584 subject to the rights, licenses and restrictions contained in 1585 BCP 78, and except as set forth therein, the authors retain all 1586 their rights. 1588 16. Disclaimer 1590 This document and the information contained herein are provided 1591 on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 1592 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND 1593 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, 1594 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY 1595 THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY 1596 RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 1597 FOR A PARTICULAR PURPOSE. 1599 17. Appendix: Hash Functions 1600 17.1 IP Shift-XOR (IPSX) Hash Function 1602 The IPSX Hash Function is tailored for acting on IP version 4 1603 packets. It exploits the structure of IP packet and in 1604 particular the variability expected to be exhibited within 1605 different fields of the IP packet in order to furnish a hash 1606 value with little apparent correlation with individual packet 1607 fields. Fields from the IPv4 and TCP/UDP headers are used as 1608 input. The IPSX Hash Function uses a small number of simple 1609 instructions. 1611 Input parameters: None 1613 Built-in parameters: None 1615 Output: The output of the IPSX is a 16 bit number 1617 Functioning: 1618 The functioning can be divided into two parts: input selection, 1619 which forms are composite input from various portions of the IP 1620 packet, followed by computation of the hash on the composite. 1622 Input Selection: 1623 The raw input is drawn from the first 20 bytes of the IP packet 1624 header and the first 8 bytes of the IP payload. If IP options 1625 are not used, the IP header has 20 bytes, and hence the two 1626 portions adjoin and comprise the first 28 bytes of the IP 1627 packet. We now use the raw input as 4 32-bit subportions of 1628 these 28 bytes. We specify the input by bit offsets from the 1629 start of IP header or payload. 1631 f1 = bits 32 to 63 of the IP header, comprising the IP 1632 identification field, flags, and fragment offset. 1634 f2 = bits 96 to 127 of the IP header, the source IP address. 1636 f3 = bits 128 to 159 of the IP header, the destination IP 1637 address. 1639 f4 = bits 32 to 63 of the IP payload. For a TCP packet, f4 1640 comprises the TCP sequence number followed by the message 1641 length. For a UDP packet f4 comprises the UDP checksum. 1643 Hash Computation: 1644 The hash is computed from f1, f2, f3 and f4 by a combination of 1645 XOR (^), right shift (>>) and left shift (<<) operations. The 1646 intermediate quantities h1, v1, v2 are 32-bit numbers. 1648 1. v1 = f1 ^ f2; 1649 2. v2 = f3 ^ f4; 1650 3. h1 = v1 << 8; 1651 4. h1 ^= v1 >> 4; 1652 5. h1 ^= v1 >> 12; 1653 6. h1 ^= v1 >> 16; 1654 7. h1 ^= v2 << 6; 1655 8. h1 ^= v2 << 10; 1656 9. h1 ^= v2 << 14; 1657 10. h1 ^= v2 >> 7; 1659 The output of the hash is the least significant 16 bits of h1. 1661 17.2 BOB Hash Function 1663 The BOB Hash Function is a Hash Function designed for having 1664 each bit of the input affecting every bit of the return value 1665 and using both 1-bit and 2-bit deltas to achieve the so called 1666 avalanche effect [Jenk97]. This function was originally built 1667 for hash table lookup with fast software implementation. 1669 Input Parameters: 1670 The input parameters of such a function are: 1671 - the length of the input string (key) to be hashed, in 1672 bytes. The elementary input blocks of Bob hash are the single 1673 bytes, therefore no padding is needed. 1674 - an init value (an arbitrary 32-bit number). 1676 Built in parameters: 1677 The Bob Hash uses the following built-in parameter: 1678 - the golden ratio (an arbitrary 32-bit number used in the hash 1679 function computation: its purpose is to avoid mapping all zeros 1680 to all zeros); 1682 Note: the mix sub-function (see mix (a,b,c) macro in the 1683 reference code in 3.2.4) has a number of parameters governing 1684 the shifts in the registers. The one presented is not the only 1685 possible choice. 1687 It is an open point whether these may be considered additional 1688 built-in parameters to specify at function configuration. 1690 Output. 1691 The output of the BOB function is a 32-bit number. It should be 1692 specified: 1693 - A 32 bit mask to apply to the output 1694 - The selection range as a list of non overlapping intervals 1695 [start value, end value] where value is in [0,2^32] 1696 Functioning: 1697 The hash value is obtained computing first an initialization of 1698 an internal state (composed of 3 32-bit numbers, called a, b, c 1699 in the reference code below), then, for each input byte of the 1700 key the internal state is combined by addition and mixed using 1701 the mix sub-function. Finally, the internal state mixed one last 1702 time and the third number of the state (c) is chosen as the 1703 return value. 1705 typedef unsigned long int ub4; /* unsigned 4-byte 1706 quantities */ 1707 typedef unsigned char ub1; /* unsigned 1-byte 1708 quantities */ 1710 #define hashsize(n) ((ub4)1<<(n)) 1711 #define hashmask(n) (hashsize(n)-1) 1713 /* ------------------------------------------------------ 1714 mix -- mix 3 32-bit values reversibly. 1715 For every delta with one or two bits set, and the deltas of 1716 all three high bits or all three low bits, whether the original 1717 value of a,b,c is almost all zero or is uniformly distributed, 1718 * If mix() is run forward or backward, at least 32 bits in 1719 a,b,c have at least 1/4 probability of changing. 1720 * If mix() is run forward, every bit of c will change between 1721 1/3 and 2/3 of the time. (Well, 22/100 and 78/100 for some 2- 1722 bit deltas.) mix() was built out of 36 single-cycle latency 1723 instructions in a structure that could supported 2x parallelism, 1724 like so: 1725 a -= b; 1726 a -= c; x = (c>>13); 1727 b -= c; a ^= x; 1728 b -= a; x = (a<<8); 1729 c -= a; b ^= x; 1730 c -= b; x = (b>>13); 1731 ... 1732 Unfortunately, superscalar Pentiums and Sparcs can't take 1733 advantage of that parallelism. They've also turned some of 1734 those single-cycle latency instructions into multi-cycle latency 1735 instructions 1737 ------------------------------------------------------------*/ 1739 #define mix(a,b,c) \ 1740 { \ 1741 a -= b; a -= c; a ^= (c>>13); \ 1742 b -= c; b -= a; b ^= (a<<8); \ 1743 c -= a; c -= b; c ^= (b>>13); \ 1744 a -= b; a -= c; a ^= (c>>12); \ 1745 b -= c; b -= a; b ^= (a<<16); \ 1746 c -= a; c -= b; c ^= (b>>5); \ 1747 a -= b; a -= c; a ^= (c>>3); \ 1748 b -= c; b -= a; b ^= (a<<10); \ 1749 c -= a; c -= b; c ^= (b>>15); \ 1750 } 1752 /* ----------------------------------------------------------- 1753 hash() -- hash a variable-length key into a 32-bit value 1754 k : the key (the unaligned variable-length array of bytes) 1755 len : the length of the key, counting by bytes 1756 initval : can be any 4-byte value 1757 Returns a 32-bit value. Every bit of the key affects every bit 1758 of the return value. Every 1-bit and 2-bit delta achieves 1759 avalanche. About 6*len+35 instructions. 1761 The best hash table sizes are powers of 2. There is no need to 1762 do mod a prime (mod is sooo slow!). If you need less than 32 1763 bits, use a bitmask. For example, if you need only 10 bits, do 1764 h = (h & hashmask(10)); 1765 In which case, the hash table should have hashsize(10) elements. 1767 If you are hashing n strings (ub1 **)k, do it like this: 1768 for (i=0, h=0; i= 12) 1794 { 1795 a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) 1796 +((ub4)k[3]<<24)); 1797 b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) 1798 +((ub4)k[7]<<24)); 1799 c += (k[8] +((ub4)k[9]<<8) 1800 +((ub4)k[10]<<16)+((ub4)k[11]<<24)); 1801 mix(a,b,c); 1802 k += 12; len -= 12; 1803 } 1805 /*---------------------------- handle the last 11 bytes */ 1806 c += length; 1807 switch(len) /* all the case statements fall through*/ 1808 { 1809 case 11: c+=((ub4)k[10]<<24); 1810 case 10: c+=((ub4)k[9]<<16); 1811 case 9 : c+=((ub4)k[8]<<8); 1812 /* the first byte of c is reserved for the length */ 1813 case 8 : b+=((ub4)k[7]<<24); 1814 case 7 : b+=((ub4)k[6]<<16); 1815 case 6 : b+=((ub4)k[5]<<8); 1816 case 5 : b+=k[4]; 1817 case 4 : a+=((ub4)k[3]<<24); 1818 case 3 : a+=((ub4)k[2]<<16); 1819 case 2 : a+=((ub4)k[1]<<8); 1820 case 1 : a+=k[0]; 1821 /* case 0: nothing left to add */ 1822 } 1823 mix(a,b,c); 1824 /*-------------------------------- report the result */ 1825 return c; 1826 }