idnits 2.17.1 draft-ietf-ippm-framework-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 37 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 38 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 685 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? ** There are 2 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 13 has weird spacing: '... Drafts are ...' == Line 14 has weird spacing: '...cuments of t...' == Line 15 has weird spacing: '...ups may also ...' == Line 23 has weird spacing: '... please check...' == Line 25 has weird spacing: '...ctories on f...' == (680 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 1998) is 9560 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 1720 == Unused Reference: 'Pa96' is defined on line 1765, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AK96' -- Possible downref: Non-RFC (?) normative reference: ref. 'BM92' -- Possible downref: Non-RFC (?) normative reference: ref. 'DS86' -- Possible downref: Non-RFC (?) normative reference: ref. 'CPB93' -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ94' ** Obsolete normative reference: RFC 1305 (ref. 'Mi92') (Obsoleted by RFC 5905) -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa96' -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa97' Summary: 12 errors (**), 0 flaws (~~), 10 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group V. Paxson, Lawrence Berkeley National Lab 3 Internet Draft G. Almes, Advanced Network & Services 4 J. Mahdavi, Pittsburgh Supercomputer Center 5 M. Mathis, Pittsburgh Supercomputer Center 6 Expiration Date: July 1998 February 1998 8 Framework for IP Performance Metrics 9 11 1. Status of this Memo 13 This document is an Internet Draft. Internet Drafts are working 14 documents of the Internet Engineering Task Force (IETF), its areas, 15 and its working groups. Note that other groups may also distribute 16 working documents as Internet Drafts. 18 Internet Drafts are draft documents valid for a maximum of six 19 months, and may be updated, replaced, or obsoleted by other documents 20 at any time. It is inappropriate to use Internet Drafts as reference 21 material or to cite them other than as ``work in progress''. 23 To learn the current status of any Internet Draft, please check the 24 ``1id-abstracts.txt'' listing contained in the Internet Drafts shadow 25 directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 26 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 27 ftp.isi.edu (US West Coast). 29 This memo provides information for the Internet community. This memo 30 does not specify an Internet standard of any kind. Distribution of 31 this memo is unlimited. 33 ID Framework for IP Performance Metrics February 1998 35 Table of Contents 37 1. STATUS OF THIS MEMO.............................................1 38 2. INTRODUCTION....................................................2 39 3. CRITERIA FOR IP PERFORMANCE METRICS.............................4 40 4. TERMINOLOGY FOR PATHS AND CLOUDS................................4 41 5. FUNDAMENTAL CONCEPTS............................................5 42 5.1 Metrics......................................................5 43 5.2 Measurement Methodology......................................6 44 5.2 Measurements, Uncertainties, and Errors......................8 45 6. METRICS AND THE ANALYTICAL FRAMEWORK............................9 46 7. EMPIRICALLY SPECIFIED METRICS..................................11 47 8. TWO FORMS OF COMPOSITION.......................................12 48 8.1 Spatial Composition of Metrics..............................12 49 8.2 Temporal Composition of Formal Models and Empirical Metrics.13 50 9. ISSUES RELATED TO TIME.........................................14 51 9.1 Clock Issues................................................14 52 9.2 The Notion of "Wire Time"...................................17 53 10. SINGLETONS, SAMPLES, AND STATISTICS............................19 54 10.1 Methods of Collecting Samples..............................20 55 10.1.1 Poisson Sampling........................................21 56 10.1.2 Geometric Sampling......................................22 57 10.1.3 Generating Poisson Sampling Intervals...................22 58 10.2 Self-Consistency...........................................23 59 10.3 Defining Statistical Distributions.........................25 60 10.4 Testing For Goodness-of-Fit................................26 61 11. AVOIDING STOCHASTIC METRICS....................................27 62 12. PACKETS OF TYPE P..............................................28 63 13. INTERNET ADDRESSES VS. HOSTS...................................29 64 14. STANDARD-FORMED PACKETS........................................29 65 15. ACKNOWLEDGEMENTS...............................................30 66 16. SECURITY CONSIDERATIONS........................................31 67 17. APPENDIX.......................................................31 68 18. REFERENCES.....................................................37 69 19. AUTHORS' ADDRESSES.............................................37 71 2. Introduction 73 The purpose of this memo is to define a general framework for 74 particular metrics to be developed by the IETF's IP Performance 75 Metrics effort, begun by the Benchmarking Methodology Working Group 76 (BMWG) of the Operational Requirements Area, and being continued by 77 the IP Performance Metrics Working Group (IPPM) of the Transport 78 Area. 80 We begin by laying out several criteria for the metrics that we 81 adopt. These criteria are designed to promote an IPPM effort that 83 ID Framework for IP Performance Metrics February 1998 85 will maximize an accurate common understanding by Internet users and 86 Internet providers of the performance and reliability both of end- 87 to-end paths through the Internet and of specific 'IP clouds' that 88 comprise portions of those paths. 90 We next define some Internet vocabulary that will allow us to speak 91 clearly about Internet components such as routers, paths, and clouds. 93 We then define the fundamental concepts of 'metric' and 'measurement 94 methodology', which allow us to speak clearly about measurement 95 issues. Given these concepts, we proceed to discuss the important 96 issue of measurement uncertainties and errors, and develop a key, 97 somewhat subtle notion of how they relate to the analytical framework 98 shared by many aspects of the Internet engineering discipline. We 99 then introduce the notion of empirically defined metrics, and finish 100 this part of the document with a general discussion of how metrics 101 can be 'composed'. 103 The remainder of the document deals with a variety of issues related 104 to defining sound metrics and methodologies: how to deal with 105 imperfect clocks; the notion of 'wire time' as distinct from 'host 106 time'; how to aggregate sets of singleton metrics into samples and 107 derive sound statistics from those samples; why it is recommended to 108 avoid thinking about Internet properties in probabilistic terms (such 109 as the probability that a packet is dropped), since these terms often 110 include implicit assumptions about how the network behaves; the 111 utility of defining metrics in terms of packets of a generic type; 112 the benefits of preferring IP addresses to DNS host names; and the 113 notion of 'standard-formed' packets. An appendix discusses the 114 Anderson-Darling test for gauging whether a set of values matches a 115 given statistical distribution, and gives C code for an 116 implementation of the test. 118 In some sections of the memo, we will surround some commentary text 119 with the brackets {Comment: ... }. We stress that this commentary is 120 only commentary, and is not itself part of the framework document or 121 a proposal of particular metrics. In some cases this commentary will 122 discuss some of the properties of metrics that might be envisioned, 123 but the reader should assume that any such discussion is intended 124 only to shed light on points made in the framework document, and not 125 to suggest any specific metrics. 127 ID Framework for IP Performance Metrics February 1998 129 3. Criteria for IP Performance Metrics 131 The overarching goal of the IP Performance Metrics effort is to 132 achieve a situation in which users and providers of Internet 133 transport service have an accurate common understanding of the 134 performance and reliability of the Internet component 'clouds' that 135 they use/provide. 137 To achieve this, performance and reliability metrics for paths 138 through the Internet must be developed. In several IETF meetings 139 criteria for these metrics have been specified: 140 + The metrics must be concrete and well-defined, 141 + A methodology for a metric should have the property that it is 142 repeatable: if the methodology is used multiple times under 143 identical conditions, the same measurements should result in the 144 same measurements. 145 + The metrics must exhibit no bias for IP clouds implemented with 146 identical technology, 147 + The metrics must exhibit understood and fair bias for IP clouds 148 implemented with non-identical technology, 149 + The metrics must be useful to users and providers in understanding 150 the performance they experience or provide, 151 + The metrics must avoid inducing artificial performance goals. 153 4. Terminology for Paths and Clouds 155 The following list defines terms that need to be precise in the 156 development of path metrics. We begin with low-level notions of a 157 path into relevant pieces. 159 host A computer capable of communicating using the Internet proto- 160 cols; includes "routers". 162 link A single link-level connection between two (or more) hosts; 163 includes leased lines, ethernets, frame relay clouds, etc. 165 routerA host which facilitates network-level communication between 166 hosts by forwarding IP packets. 168 path A sequence of the form < h0, l1, h1, ..., ln, hn >, where n >= 169 0, each hi is a host, each li is a link between hi-1 and hi, 170 each h1...hn-1 is a router. A pair is termed a 'hop'. 171 In an appropriate operational configuration, the links and 172 routers in the path facilitate network-layer communication of 173 packets from h0 to hn. Note that path is a unidirectional con- 174 cept. 176 ID Framework for IP Performance Metrics February 1998 178 subpath 179 Given a path, a subpath is any subsequence of the given path 180 which is itself a path. (Thus, the first and last element of a 181 subpath is a host.) 183 cloudAn undirected (possibly cyclic) graph whose vertices are routers 184 and whose edges are links that connect pairs of routers. For- 185 mally, ethernets, frame relay clouds, and other links that con- 186 nect more than two routers are modelled as fully-connected 187 meshes of graph edges. Note that to connect to a cloud means to 188 connect to a router of the cloud over a link; this link is not 189 itself part of the cloud. 191 exchange 192 A special case of a link, an exchange directly connects either a 193 host to a cloud and/or one cloud to another cloud. 195 cloud subpath 196 A subpath of a given path, all of whose hosts are routers of a 197 given cloud. 199 path digest 200 A sequence of the form < h0, e1, C1, ..., en, hn >, where n >= 201 0, h0 and hn are hosts, each e1 ... en is an exchange, and each 202 C1 ... Cn-1 is a cloud subpath. 204 5. Fundamental Concepts 206 5.1. Metrics 208 In the operational Internet, there are several quantities related to 209 the performance and reliability of the Internet that we'd like to 210 know the value of. When such a quantity is carefully specified, we 211 term the quantity a metric. We anticipate that there will be 212 separate RFCs for each metric (or for each closely related group of 213 metrics). 215 In some cases, there might be no obvious means to effectively measure 216 the metric; this is allowed, and even understood to be very useful in 217 some cases. It is required, however, that the specification of the 218 metric be as clear as possible about what quantity is being speci- 219 fied. Thus, difficulty in practical measurement is sometimes 220 allowed, but ambiguity in meaning is not. 222 Each metric will be defined in terms of standard units of measure- 223 ment. The international metric system will be used, with the 225 ID Framework for IP Performance Metrics February 1998 227 following points specifically noted: 228 + When a unit is expressed in simple meters (for distance/length) or 229 seconds (for duration), appropriate related units based on 230 thousands or thousandths of acceptable units are acceptable. 231 Thus, distances expressed in kilometers (km), durations expressed 232 in milliseconds (ms), or microseconds (us) are allowed, but not 233 centimeters (because the prefix is not in terms of thousands or 234 thousandths). 235 + When a unit is expressed in a combination of units, appropriate 236 related units based on thousands or thousandths of acceptable 237 units are acceptable, but all such thousands/thousandths must be 238 grouped at the beginning. Thus, kilo-meters per second (km/s) is 239 allowed, but meters per millisecond is not. 240 + The unit of information is the bit. 241 + When metric prefixes are used with bits or with combinations 242 including bits, those prefixes will have their metric meaning 243 (related to decimal 1000), and not the meaning conventional with 244 computer storage (related to decimal 1024). In any RFC that 245 defines a metric whose units include bits, this convention will be 246 followed and will be repeated to ensure clarity for the reader. 247 + When a time is given, it will be expressed in UTC. 248 Note that these points apply to the specifications for metrics and 249 not, for example, to packet formats where octets will likely be used 250 in preference/addition to bits. 252 Finally, we note that some metrics may be defined purely in terms of 253 other metrics; such metrics are call 'derived metrics'. 255 5.2. Measurement Methodology 257 For a given set of well-defined metrics, a number of distinct meas- 258 urement methodologies may exist. A partial list includes: 259 + Direct measurement of a performance metric using injected test 260 traffic. Example: measurement of the round-trip delay of an IP 261 packet of a given size over a given route at a given time. 262 + Projection of a metric from lower-level measurements. Example: 263 given accurate measurements of propagation delay and bandwidth for 264 each step along a path, projection of the complete delay for the 265 path for an IP packet of a given size. 266 + Estimation of a consituent metric from a set of more aggregated 267 measurements. Example: given accurate measurements of delay for a 268 given one-hop path for IP packets of different sizes, estimation 269 of propagation delay for the link of that one-hop path. 271 ID Framework for IP Performance Metrics February 1998 273 + Estimation of a given metric at one time from a set of related 274 metrics at other times. Example: given an accurate measurement of 275 flow capacity at a past time, together with a set of accurate 276 delay measurements for that past time and the current time, and 277 given a model of flow dynamics, estimate the flow capacity that 278 would be observed at the current time. 279 This list is by no means exhaustive. The purpose is to point out the 280 variety of measurement techniques. 282 When a given metric is specified, a given measurement approach might 283 be noted and discussed. That approach, however, is not formally part 284 of the specification. 286 A methodology for a metric should have the property that it is 287 repeatable: if the methodology is used multiple times under identical 288 conditions, it should result in consistent measurements. 290 Backing off a little from the word 'identical' in the previous para- 291 graph, we could more accurately use the word 'continuity' to describe 292 a property of a given methodology: a methodology for a given metric 293 exhibits continuity if, for small variations in conditions, it 294 results in small variations in the resulting measurements. Slightly 295 more precisely, for every positive epsilon, there exists a positive 296 delta, such that if two sets of conditions are within delta of each 297 other, then the resulting measurements will be within epsilon of each 298 other. At this point, this should be taken as a heuristic driving 299 our intuition about one kind of robustness property rather than as a 300 precise notion. 302 A metric that has at least one methodology that exhibits continuity 303 is said itself to exhibit continuity. 305 Note that some metrics, such as hop-count along a path, are integer- 306 valued and therefore cannot exhibit continuity in quite the sense 307 given above. 309 Note further that, in practice, it may not be practical to know (or 310 be able to quantify) the conditions relevant to a measurement at a 311 given time. For example, since the instantaneous load (in packets to 312 be served) at a given router in a high-speed wide-area network can 313 vary widely over relatively brief periods and will be very hard for 314 an external observer to quantify, various statistics of a given 315 metric may be more repeatable, or may better exhibit continuity. In 316 that case those particular statistics should be specified when the 317 metric is specified. 319 Finally, some measurement methodologies may be 'conservative' in the 320 sense that the act of measurement does not modify, or only slightly 322 ID Framework for IP Performance Metrics February 1998 324 modifies, the value of the performance metric the methodology 325 attempts to measure. {Comment: for example, in a wide-are high-speed 326 network under modest load, a test using several small 'ping' packets 327 to measure delay would likely not interfere (much) with the delay 328 properties of that network as observed by others. The corresponding 329 statement about tests using a large flow to measure flow capacity 330 would likely fail.} 332 5.3. Measurements, Uncertainties, and Errors 334 Even the very best measurement methodologies for the very most well 335 behaved metrics will exhibit errors. Those who develop such measure- 336 ment methodologies, however, should strive to: 337 + minimize their uncertainties/errors, 338 + understand and document the sources of uncertainty/error, and 339 + quantify the amounts of uncertainty/error. 341 For example, when developing a method for measuring delay, understand 342 how any errors in your clocks introduce errors into your delay meas- 343 urement, and quantify this effect as well as you can. In some cases, 344 this will result in a requirement that a clock be at least up to a 345 certain quality if it is to be used to make a certain measurement. 347 As a second example, consider the timing error due to measurement 348 overheads within the computer making the measurement, as opposed to 349 delays due to the Internet component being measured. The former is a 350 measurement error, while the latter reflects the metric of interest. 351 Note that one technique that can help avoid this overhead is the use 352 of a packet filter/sniffer, running on a separate computer that 353 records network packets and timestamps them accurately (see the dis- 354 cussion of 'wire time' below). The resulting trace can then be 355 analysed to assess the test traffic, minimising the effect of meas- 356 urement host delays, or at least allowing those delays to be 357 accounted for. We note that this technique may prove beneficial even 358 if the packet filter/sniffer runs on the same machine, because such 359 measurements generally provide 'kernel-level' timestamping as opposed 360 to less-accurate 'application-level' timestamping. 362 Finally, we note that derived metrics (defined above) or metrics that 363 exhibit spatial or temporal composition (defined below) offer partic- 364 ular occasion for the analysis of measurement uncertainties, namely 365 how the uncertainties propagate (conceptually) due to the derivation 366 or composition. 368 ID Framework for IP Performance Metrics February 1998 370 6. Metrics and the Analytical Framework 372 As the Internet has evolved from the early packet-switching studies 373 of the 1960s, the Internet engineering community has evolved a common 374 analytical framework of concepts. This analytical framework, or A- 375 frame, used by designers and implementers of protocols, by those 376 involved in measurement, and by those who study computer network per- 377 formance using the tools of simulation and analysis, has great advan- 378 tage to our work. A major objective here is to generate network 379 characterizations that are consistent in both analytical and practi- 380 cal settings, since this will maximize the chances that non-empirical 381 network study can be better correlated with, and used to further our 382 understanding of, real network behavior. 384 Whenever possible, therefore, we would like to develop and leverage 385 off of the A-frame. Thus, whenever a metric to be specified is 386 understood to be closely related to concepts within the A-frame, we 387 will attempt to specify the metric in the A-frame's terms. In such a 388 specification we will develop the A-frame by precisely defining the 389 concepts needed for the metric, then leverage off of the A-frame by 390 defining the metric in terms of those concepts. 392 Such a metric will be called an 'analytically specified metric' or, 393 more simply, an analytical metric. 395 {Comment: Examples of such analytical metrics might include: 397 propagation time of a link 398 The time, in seconds, required by a single bit to travel from the 399 output port on one Internet host across a single link to another 400 Internet host. 402 bandwidth of a link for packets of size k 403 The capacity, in bits/second, where only those bits of the IP 404 packet are counted, for packets of size k bytes. 406 routeThe path, as defined in Section 4, from A to B at a given time. 408 hop count of a route 409 The value 'n' of the route path. 410 } 412 Note that we make no a priori list of just what A-frame concepts 413 will emerge in these specifications, but we do encourage their use 414 and urge that they be carefully specified so that, as our set of 415 metrics develops, so will a specified set of A-frame concepts 416 technically consistent with each other and consonent with the com- 417 mon understanding of those concepts within the general Internet 419 ID Framework for IP Performance Metrics February 1998 421 community. 423 These A-frame concepts will be intended to abstract from actual 424 Internet components in such a way that: 425 + the essential function of the component is retained, 426 + properties of the component relevant to the metrics we aim to 427 create are retained, 428 + a subset of these component properties are potentially defined as 429 analytical metrics, and 430 + those properties of actual Internet components not relevant to 431 defining the metrics we aim to create are dropped. 433 For example, when considering a router in the context of packet for- 434 warding, we might model the router as a component that receives pack- 435 ets on an input link, queues them on a FIFO packet queue of finite 436 size, employs tail-drop when the packet queue is full, and forwards 437 them on an output link. The transmission speed (in bits/second) of 438 the input and output links, the latency in the router (in seconds), 439 and the maximum size of the packet queue (in bits) are relevant 440 analytical metrics. 442 In some cases, such analytical metrics used in relation to a router 443 will be very closely related to specific metrics of the performance 444 of Internet paths. For example, an obvious formula (L + P/B) involv- 445 ing the latency in the router (L), the packet size (in bits) (P), and 446 the transmission speed of the output link (B) might closely approxi- 447 mate the increase in packet delay due to the insertion of a given 448 router along a path. 450 We stress, however, that well-chosen and well-specified A-frame con- 451 cepts and their analytical metrics will support more general metric 452 creation efforts in less obvious ways. 454 {Comment: for example, when considering the flow capacity of a path, 455 it may be of real value to be able to model each of the routers along 456 the path as packet forwarders as above. Techniques for estimating 457 the flow capacity of a path might use the maximum packet queue size 458 as a parameter in decidedly non-obvious ways. For example, as the 459 maximum queue size increases, so will the ability of the router to 460 continuously move traffic along an output link despite fluctuations 461 in traffic from an input link. Estimating this increase, however, 462 remains a research topic.} 464 Note that, when we specify A-frame concepts and analytical metrics, 465 we will inevitably make simplifying assumptions. The key role of 466 these concepts is to abstract the properties of the Internet com- 467 ponents relevant to given metrics. Judgement is required to avoid 468 making assumptions that bias the modeling and metric effort toward 470 ID Framework for IP Performance Metrics February 1998 472 one kind of design. 474 {Comment: for example, routers might not use tail-drop, even though 475 tail-drop might be easier to model analytically.} 477 Finally, note that different elements of the A-frame might well make 478 different simplifying assumptions. For example, the abstraction of a 479 router used to further the definition of path delay might treat the 480 router's packet queue as a single FIFO queue, but the abstraction of 481 a router used to further the definition of the handling of an RSVP- 482 enabled packet might treat the router's packet queue as supporting 483 bounded delay -- a contradictory assumption. This is not to say that 484 we make contradictory assumptions at the same time, but that two dif- 485 ferent parts of our work might refine the simpler base concept in two 486 divergent ways for different purposes. 488 {Comment: in more mathematical terms, we would say that the A-frame 489 taken as a whole need not be consistent; but the set of particular 490 A-frame elements used to define a particular metric must be.} 492 7. Empirically Specified Metrics 494 There are useful performance and reliability metrics that do not fit 495 so neatly into the A-frame, usually because the A-frame lacks the 496 detail or power for dealing with them. For example, "the best flow 497 capacity achievable along a path using an RFC-2001-compliant TCP" 498 would be good to be able to measure, but we have no analytical frame- 499 work of sufficient richness to allow us to cast that flow capacity as 500 an analytical metric. 502 These notions can still be well specified by instead describing a 503 reference methodology for measuring them. 505 Such a metric will be called an 'empirically specified metric', or 506 more simply, an empirical metric. 508 Such empirical metrics should have three properties: 509 + we should have a clear definition for each in terms of Internet 510 components, 511 + we should have at least one effective means to measure them, and 512 + to the extent possible, we should have an (necessarily incomplete) 513 understanding of the metric in terms of the A-frame so that we can 514 use our measurements to reason about the performance and reliabil- 515 ity of A-frame components and of aggregations of A-frame com- 516 ponents. 518 ID Framework for IP Performance Metrics February 1998 520 8. Two Forms of Composition 522 8.1. Spatial Composition of Metrics 524 In some cases, it may be realistic and useful to define metrics in 525 such a fashion that they exhibit spatial composition. 527 By spatial composition, we mean a characteristic of some path 528 metrics, in which the metric as applied to a (complete) path can also 529 be defined for various subpaths, and in which the appropriate A-frame 530 concepts for the metric suggest useful relationships between the 531 metric applied to these various subpaths (including the complete 532 path, the various cloud subpaths of a given path digest, and even 533 single routers along the path). The effectiveness of spatial compo- 534 sition depends: 535 + on the usefulness in analysis of these relationships as applied to 536 the relevant A-frame components, and 537 + on the practical use of the corresponding relationships as applied 538 to metrics and to measurement methodologies. 540 {Comment: for example, consider some metric for delay of a 100-byte 541 packet across a path P, and consider further a path digest of P. The definition of such a metric might include 543 a conjecture that the delay across P is very nearly the sum of the 544 corresponding metric across the exhanges (ei) and clouds (Ci) of the 545 given path digest. The definition would further include a note on 546 how a corresponding relation applies to relevant A-frame components, 547 both for the path P and for the exchanges and clouds of the path dig- 548 est.} 550 When the definition of a metric includes a conjecture that the metric 551 across the path is related to the metric across the subpaths of the 552 path, that conjecture constitutes a claim that the metric exhibits 553 spatial composition. The definition should then include: 554 + the specific conjecture applied to the metric, 555 + a justification of the practical utility of the composition in 556 terms of making accurate measurements of the metric on the path, 557 + a justification of the usefulness of the composition in terms of 558 making analysis of the path using A-frame concepts more effective, 559 and 560 + an analysis of how the conjecture could be incorrect. 562 ID Framework for IP Performance Metrics February 1998 564 8.2. Temporal Composition of Formal Models and Empirical Metrics 566 In some cases, it may be realistic and useful to define metrics in 567 such a fashion that they exhibit temporal composition. 569 By temporal composition, we mean a characteristic of some path 570 metric, in which the metric as applied to a path at a given time T is 571 also defined for various times t0 < t1 < ... < tn < T, and in which 572 the appropriate A-frame concepts for the metric suggests useful rela- 573 tionships between the metric applied at times t0, ..., tn and the 574 metric applied at time T. The effectiveness of temporal composition 575 depends: 576 + on the usefulness in analysis of these relationships as applied to 577 the relevant A-frame components, and 578 + on the practical use of the corresponding relationships as applied 579 to metrics and to measurement methodologies. 581 {Comment: for example, consider a metric for the expected flow capa- 582 city across a path P during the five-minute period surrounding the 583 time T, and suppose further that we have the corresponding values for 584 each of the four previous five-minute periods t0, t1, t2, and t3. 585 The definition of such a metric might include a conjecture that the 586 flow capacity at time T can be estimated from a certain kind of 587 extrapolation from the values of t0, ..., t3. The definition would 588 further include a note on how a corresponding relation applies to 589 relevant A-frame components. 591 Note: any (spatial or temporal) compositions involving flow capacity 592 are likely to be subtle, and temporal compositions are generally more 593 subtle than spatial compositions, so the reader should understand 594 that the foregoing example is intentionally naive.} 596 When the definition of a metric includes a conjecture that the metric 597 across the path at a given time T is related to the metric across the 598 path for a set of other times, that conjecture constitutes a claim 599 that the metric exhibits temporal composition. The definition should 600 then include: 601 + the specific conjecture applied to the metric, 602 + a justification of the practical utility of the composition in 603 terms of making accurate measurements of the metric on the path, 604 and 605 + a justification of the usefulness of the composition in terms of 606 making analysis of the path using A-frame concepts more effective. 608 ID Framework for IP Performance Metrics February 1998 610 9. Issues related to Time 612 9.1. Clock Issues 614 Measurements of time lie at the heart of many Internet metrics. 615 Because of this, it will often be crucial when designing a methodol- 616 ogy for measuring a metric to understand the different types of 617 errors and uncertainties introduced by imperfect clocks. In this 618 section we define terminology for discussing the characteristics of 619 clocks and touch upon related measurement issues which need to be 620 addressed by any sound methodology. 622 The Network Time Protocol (NTP; RFC 1305) defines a nomenclature for 623 discussing clock characteristics, which we will also use when 624 appropriate [Mi92]. The main goal of NTP is to provide accurate 625 timekeeping over fairly long time scales, such as minutes to days, 626 while for measurement purposes often what is more important is 627 short-term accuracy, between the beginning of the measurement and the 628 end, or over the course of gathering a body of measurements (a sam- 629 ple). This difference in goals sometimes leads to different defini- 630 tions of terminology as well, as discussed below. 632 To begin, we define a clock's "offset" at a particular moment as the 633 difference between the time reported by the clock and the "true" time 634 as defined by UTC. If the clock reports a time Tc and the true time 635 is Tt, then the clock's offset is Tc - Tt. 637 We will refer to a clock as "accurate" at a particular moment if the 638 clock's offset is zero, and more generally a clock's "accuracy" is 639 how close the absolute value of the offset is to zero. For NTP, 640 accuracy also includes a notion of the frequency of the clock; for 641 our purposes, we instead incorporate this notion into that of "skew", 642 because we define accuracy in terms of a single moment in time rather 643 than over an interval of time. 645 A clock's "skew" at a particular moment is the frequency difference 646 (first derivative of its offset with respect to true time) between 647 the clock and true time. 649 As noted in RFC 1305, real clocks exhibit some variation in skew. 650 That is, the second derivative of the clock's offset with respect to 651 true time is generally non-zero. In keeping with RFC 1305, we define 652 this quantity as the clock's "drift". 654 A clock's "resolution" is the smallest unit by which the clock's time 655 is updated. It gives a lower bound on the clock's uncertainty. 656 (Note that clocks can have very fine resolutions and yet be wildly 658 ID Framework for IP Performance Metrics February 1998 660 inaccurate.) Resolution is defined in terms of seconds. However, 661 resolution is relative to the clock's reported time and not to true 662 time, so for example a resolution of 10 ms only means that the clock 663 updates its notion of time in 0.01 second increments, not that this 664 is the true amount of time between updates. 666 {Comment: Systems differ on how an application interface to the clock 667 reports the time on subsequent calls during which the clock has not 668 advanced. Some systems simply return the same unchanged time as 669 given for previous calls. Others may add a small increment to the 670 reported time to maintain monotonic increasing timestamps. For sys- 671 tems that do the latter, we do *not* consider these small increments 672 when defining the clock's resolution. They are instead an impediment 673 to assessing the clock's resolution, since a natural method for doing 674 so is to repeatedly query the clock to determine the smallest non- 675 zero difference in reported times.} 677 It is expected that a clock's resolution changes only rarely (for 678 example, due to a hardware upgrade). 680 There are a number of interesting metrics for which some natural 681 measurement methodologies involve comparing times reported by two 682 different clocks. An example is one-way packet delay [AK96]. Here, 683 the time required for a packet to travel through the network is meas- 684 ured by comparing the time reported by a clock at one end of the 685 packet's path, corresponding to when the packet first entered the 686 network, with the time reported by a clock at the other end of the 687 path, corresponding to when the packet finished traversing the net- 688 work. 690 We are thus also interested in terminology for describing how two 691 clocks C1 and C2 compare. To do so, we introduce terms related to 692 those above in which the notion of "true time" is replaced by the 693 time as reported by clock C1. For example, clock C2's offset rela- 694 tive to C1 at a particular moment is Tc2 - Tc1, the instantaneous 695 difference in time reported by C2 and C1. To disambiguate between 696 the use of the terms to compare two clocks versus the use of the 697 terms to compare to true time, we will in the former case use the 698 phrase "relative". So the offset defined earlier in this paragraph 699 is the "relative offset" between C2 and C1. 701 When comparing clocks, the analog of "resolution" is not "relative 702 resolution", but instead "joint resolution", which is the sum of the 703 resolutions of C1 and C2. The joint resolution then indicates a con- 704 servative lower bound on the accuracy of any time intervals computed 705 by subtracting timestamps generated by one clock from those generated 706 by the other. 708 ID Framework for IP Performance Metrics February 1998 710 If two clocks are "accurate" with respect to one another (their rela- 711 tive offset is zero), we will refer to the pair of clocks as "syn- 712 chronized". Note that clocks can be highly synchronized yet arbi- 713 trarily inaccurate in terms of how well they tell true time. This 714 point is important because for many Internet measurements, synchroni- 715 zation between two clocks is more important than the accuracy of the 716 clocks. The is somewhat true of skew, too: as long as the absolute 717 skew is not too great, then minimal relative skew is more important, 718 as it can induce systematic trends in packet transit times measured 719 by comparing timestamps produced by the two clocks. 721 These distinctions arise because for Internet measurement what is 722 often most important are differences in time as computed by comparing 723 the output of two clocks. The process of computing the difference 724 removes any error due to clock inaccuracies with respect to true 725 time; but it is crucial that the differences themselves accurately 726 reflect differences in true time. 728 Measurement methodologies will often begin with the step of assuring 729 that two clocks are synchronized and have minimal skew and drift. 730 {Comment: An effective way to assure these conditions (and also clock 731 accuracy) is by using clocks that derive their notion of time from an 732 external source, rather than only the host computer's clock. (These 733 latter are often subject to large errors.) It is further preferable 734 that the clocks directly derive their time, for example by having 735 immediate access to a GPS (Global Positioning System) unit.} 737 Two important concerns arise if the clocks indirectly derive their 738 time using a network time synchronization protocol such as NTP: 739 + First, NTP's accuracy depends in part on the properties (particu- 740 larly delay) of the Internet paths used by the NTP peers, and 741 these might be exactly the properties that we wish to measure, so 742 it would be unsound to use NTP to calibrate such measurements. 743 + Second, NTP focuses on clock accuracy, which can come at the 744 expense of short-term clock skew and drift. For example, when a 745 host's clock is indirectly synchronized to a time source, if the 746 synchronization intervals occur infrequently, then the host will 747 sometimes be faced with the problem of how to adjust its current, 748 incorrect time, Ti, with a considerably different, more accurate 749 time it has just learned, Ta. Two general ways in which this is 750 done are to either immediately set the current time to Ta, or to 751 adjust the local clock's update frequency (hence, its skew) so 752 that at some point in the future the local time Ti' will agree 753 with the more accurate time Ta'. The first mechanism introduces 754 discontinuities and can also violate common assumptions that 755 timestamps are monotone increasing. If the host's clock is set 756 backward in time, sometimes this can be easily detected. If the 757 clock is set forward in time, this can be harder to detect. The 759 ID Framework for IP Performance Metrics February 1998 761 skew induced by the second mechanism can lead to considerable 762 inaccuracies when computing differences in time, as discussed 763 above. 765 To illustrate why skew is a crucial concern, consider samples of 766 one-way delays between two Internet hosts made at one minute inter- 767 vals. The true transmission delay between the hosts might plausibly 768 be on the order of 50 ms for a transcontinental path. If the skew 769 between the two clocks is 0.01%, that is, 1 part in 10,000, then 770 after 10 minutes of observation the error introduced into the meas- 771 urement is 60 ms. Unless corrected, this error is enough to com- 772 pletely wipe out any accuracy in the transmission delay measurement. 773 Finally, we note that assessing skew errors between unsynchronized 774 network clocks is an open research area. (See [Pa97] for a discus- 775 sion of detecting and compensating for these sorts of errors.) This 776 shortcoming makes use of a solid, independent clock source such as 777 GPS especially desirable. 779 9.2. The Notion of "Wire Time" 781 Internet measurement is often complicated by the use of Internet 782 hosts themselves to perform the measurement. These hosts can intro- 783 duce delays, bottlenecks, and the like that are due to hardware or 784 operating system effects and have nothing to do with the network 785 behavior we would like to measure. This problem is particularly 786 acute when timestamping of network events occurs at the application 787 level. 789 In order to provide a general way of talking about these effects, we 790 introduce two notions of "wire time". These notions are only defined 791 in terms of an Internet host H observing an Internet link L at a par- 792 ticular location: 793 + For a given packet P, the 'wire arrival time' of P at H on L is 794 the first time T at which any bit of P has appeared at H's obser- 795 vational position on L. 796 + For a given packet P, the 'wire exit time' of P at H on L is the 797 first time T at which all the bits of P have appeared at H's 798 observational position on L. 799 Note that intrinsic to the definition is the notion of where on the 800 link we are observing. This distinction is important because for 801 large-latency links, we may obtain very different times depending on 802 exactly where we are observing the link. We could allow the observa- 803 tional position to be an arbitrary location along the link; however, 804 we define it to be in terms of an Internet host because we anticipate 805 in practice that, for IPPM metrics, all such timing will be con- 806 strained to be performed by Internet hosts, rather than specialized 807 hardware devices that might be able to monitor a link at locations 809 ID Framework for IP Performance Metrics February 1998 811 where a host cannot. This definition also takes care of the problem 812 of links that are comprised of multiple physical channels. Because 813 these multiple channels are not visible at the IP layer, they cannot 814 be individually observed in terms of the above definitions. 816 It is possible, though one hopes uncommon, that a packet P might make 817 multiple trips over a particular link L, due to a forwarding loop. 818 These trips might even overlap, depending on the link technology. 819 Whenever this occurs, we define a separate wire time associated with 820 each instance of P seen at H's position on the link. This definition 821 is worth making because it serves as a reminder that notions like 822 *the* unique time a packet passes a point in the Internet are 823 inherently slippery. 825 The term wire time has historically been used to loosely denote the 826 time at which a packet appeared on a link, without exactly specifying 827 whether this refers to the first bit, the last bit, or some other 828 consideration. This informal definition is generally already very 829 useful, as it is usually used to make a distinction between when the 830 packet's propagation delays begin and cease to be due to the network 831 rather than the endpoint hosts. 833 When appropriate, metrics should be defined in terms of wire times 834 rather than host endpoint times, so that the metric's definition 835 highlights the issue of separating delays due to the host from those 836 due to the network. 838 We note that one potential difficulty when dealing with wire times 839 concerns IP fragments. It may be the case that, due to fragmenta- 840 tion, only a portion of a particular packet passes by H's location. 841 Such fragments are themselves legitimate packets and have well- 842 defined wire times associated with them; but the larger IP packet 843 corresponding to their aggregate may not. 845 We also note that these notions have not, to our knowledge, been pre- 846 viously defined in exact terms for Internet traffic. Consequently, 847 we may find with experience that these definitions require some 848 adjustment in the future. 850 {Comment: It can sometimes be difficult to measure wire times. One 851 technique is to use a packet filter to monitor traffic on a link. 852 The architecture of these filters often attempts to associate with 853 each packet a timestamp as close to the wire time as possible. We 854 note however that one common source of error is to run the packet 855 filter on one of the endpoint hosts. In this case, it has been 856 observed that some packet filters receive for some packets timestamps 857 corresponding to when the packet was *scheduled* to be injected into 858 the network, rather than when it actually was *sent* out onto the 860 ID Framework for IP Performance Metrics February 1998 862 network (wire time). There can be a substantial difference between 863 these two times. A technique for dealing with this problem is to run 864 the packet filter on a separate host that passively monitors the 865 given link. This can be problematic however for some link technolo- 866 gies. See [Pa97] for a discussion of the sorts of errors packet 867 filters can exhibit. Finally, we note that packet filters will often 868 only capture the first fragment of a fragmented IP packet, due to the 869 use of filtering on fields in the IP and transport protocol headers. 870 As we generally desire our measurement methodologies to avoid the 871 complexity of creating fragmented traffic, one strategy for dealing 872 with their presence as detected by a packet filter is to flag that 873 the measured traffic has an unusual form and abandon further analysis 874 of the packet timing.} 876 10. Singletons, Samples, and Statistics 878 With experience we have found it useful to introduce a separation 879 between three distinct -- yet related -- notions: 880 + By a 'singleton' metric, we refer to metrics that are, in a sense, 881 atomic. For example, a single instance of "bulk throughput capa- 882 city" from one host to another might be defined as a singleton 883 metric, even though the instance involves measuring the timing of 884 a number of Internet packets. 885 + By a 'sample' metric, we refer to metrics derived from a given 886 singleton metric by taking a number of distinct instances 887 together. For example, we might define a sample metric of one-way 888 delays from one host to another as an hour's worth of measure- 889 ments, each made at Poisson intervals with a mean spacing of one 890 second. 891 + By a 'statistical' metric, we refer to metrics derived from a 892 given sample metric by computing some statistic of the values 893 defined by the singleton metric on the sample. For example, the 894 mean of all the one-way delay values on the sample given above 895 might be defined as a statistical metric. 896 By applying these notions of singleton, sample, and statistic in a 897 consistent way, we will be able to reuse lessons learned about how to 898 define samples and statistics on various metrics. The orthogonality 899 among these three notions will thus make all our work more effective 900 and more intelligible by the community. 902 In the remainder of this section, we will cover some topics in sam- 903 pling and statistics that we believe will be important to a variety 904 of metric definitions and measurement efforts. 906 ID Framework for IP Performance Metrics February 1998 908 10.1. Methods of Collecting Samples 910 The main reason for collecting samples is to see what sort of varia- 911 tions and consistencies are present in the metric being measured. 912 These variations might be with respect to different points in the 913 Internet, or different measurement times. When assessing variations 914 based on a sample, one generally makes an assumption that the sample 915 is "unbiased", meaning that the process of collecting the measure- 916 ments in the sample did not skew the sample so that it no longer 917 accurately reflects the metric's variations and consistencies. 919 One common way of collecting samples is to make measurements 920 separated by fixed amounts of time: periodic sampling. Periodic sam- 921 pling is particularly attractive because of its simplicity, but it 922 suffers from two potential problems: 923 + If the metric being measured itself exhibits periodic behavior, 924 then there is a possibility that the sampling will observe only 925 part of the periodic behavior if the periods happen to agree 926 (either directly, or if one is a multiple of the other). Related 927 to this problem is the notion that periodic sampling can be easily 928 anticipated. Predictable sampling is susceptible to manipulation 929 if there are mechanisms by which a network component's behavior 930 can be temporarily changed such that the sampling only sees the 931 modified behavior. 932 + The act of measurement can perturb what is being measured (for 933 example, injecting measurement traffic into a network alters the 934 congestion level of the network), and repeated periodic perturba- 935 tions can drive a network into a state of synchronization (cf. 936 [FJ94]), greatly magnifying what might individually be minor 937 effects. 939 A more sound approach is based on "random additive sampling": samples 940 are separated by independent, randomly generated intervals that have 941 a common statistical distribution G(t) [BM92]. The quality of this 942 sampling depends on the distribution G(t). For example, if G(t) gen- 943 erates a constant value g with probability one, then the sampling 944 reduces to periodic sampling with a period of g. 946 Random additive sampling gains significant advantages. In general, 947 it avoids synchronization effects and yields an unbiased estimate of 948 the property being sampled. The only significant drawbacks with it 949 are: 950 + it complicates frequency-domain analysis, because the samples do 951 not occur at fixed intervals such as assumed by Fourier-transform 952 techniques; and 954 ID Framework for IP Performance Metrics February 1998 956 + unless G(t) is the exponential distribution (see below), sampling 957 still remains somewhat predictable, as discussed for periodic sam- 958 pling above. 960 10.1.1. Poisson Sampling 962 It can be proved that if G(t) is an exponential distribution with 963 rate lambda, that is 964 G(t) = 1 - exp(-lambda * t) 965 then the arrival of new samples *cannot* be predicted (and, again, 966 the sampling is unbiased). Furthermore, the sampling is asymptoti- 967 cally unbiased even if the act of sampling affects the network's 968 state. Such sampling is referred to as "Poisson sampling". It is 969 not prone to inducing synchronization, it can be used to accurately 970 collect measurements of periodic behavior, and it is not prone to 971 manipulation by anticipating when new samples will occur. 973 Because of these valuable properties, we in general prefer that sam- 974 ples of Internet measurements are gathered using Poisson sampling. 975 {Comment: We note, however, that there may be circumstances that 976 favor use of a different G(t). For example, the exponential distri- 977 bution is unbounded, so its use will on occasion generate lengthy 978 spaces between sampling times. We might instead desire to bound the 979 longest such interval to a maximum value dT, to speed the convergence 980 of the estimation derived from the sampling. This could be done by 981 using 982 G(t) = Unif(0, dT) 983 that is, the uniform distribution between 0 and dT. This sampling, 984 of course, becomes highly predictable if an interval of nearly length 985 dT has elapsed without a sample occurring.} 987 In its purest form, Poisson sampling is done by generating indepen- 988 dent, exponentially distributed intervals and gathering a single 989 measurement after each interval has elapsed. It can be shown that if 990 starting at time T one performs Poisson sampling over an interval dT, 991 during which a total of N measurements happen to be made, then those 992 measurements will be uniformly distributed over the interval [T, 993 T+dT]. So another way of conducting Poisson sampling is to pick dT 994 and N and generate N random sampling times uniformly over the inter- 995 val [T, T+dT]. The two approaches are equivalent, except if N and dT 996 are externally known. In that case, the property of not being able 997 to predict measurement times is weakened (the other properties still 998 hold). The N/dT approach has an advantage that dealing with fixed 999 values of N and dT can be simpler than dealing with a fixed lambda 1000 but variable numbers of measurements over variably-sized intervals. 1002 ID Framework for IP Performance Metrics February 1998 1004 10.1.2. Geometric Sampling 1006 Closely related to Poisson sampling is "geometric sampling", in which 1007 external events are measured with a fixed probability p. For exam- 1008 ple, one might capture all the packets over a link but only record 1009 the packet to a trace file if a randomly generated number uniformly 1010 distributed between 0 and 1 is less than a given p. Geometric sam- 1011 pling has the same properties of being unbiased and not predictable 1012 in advance as Poisson sampling, so if it fits a particular Internet 1013 measurement task, it too is sound. See [CPB93] for more discussion. 1015 10.1.3. Generating Poisson Sampling Intervals 1017 To generate Poisson sampling intervals, one first determines the rate 1018 lambda at which the singleton measurements will on average be made 1019 (e.g., for an average sampling interval of 30 seconds, we have lambda 1020 = 1/30, if the units of time are seconds). One then generates a 1021 series of exponentially-distributed (pseudo-)random numbers E1, E2, 1022 ..., En. The first measurement is made at time E1, the next at time 1023 E1+E2, and so on. 1025 One technique for generating exponentially-distributed (pseudo- 1026 )random numbers is based on the ability to generate U1, U2, ..., Un, 1027 (pseudo-)random numbers that are uniformly distributed between 0 and 1028 1. Many computers provide libraries that can do this. Given such 1029 Ui, to generate Ei one uses: 1030 Ei = -log(Ui) / lambda 1031 where log(Ui) is the natural logarithm of Ui. {Comment: This tech- 1032 nique is an instance of the more general "inverse transform" method 1033 for generating random numbers with a given distribution.} 1035 Implementation details: 1037 There are at least three different methods for approximating Poisson 1038 sampling, which we describe here as Methods 1 through 3. Method 1 is 1039 the easiest to implement and has the most error, and method 3 is the 1040 most difficult to implement and has the least error (potentially 1041 none). 1043 Method 1 is to proceed as follows: 1044 1. Generate E1 and wait that long. 1045 2. Perform a measurement. 1046 3. Generate E2 and wait that long. 1047 4. Perform a measurement. 1048 5. Generate E3 and wait that long. 1049 6. Perform a measurement ... 1051 ID Framework for IP Performance Metrics February 1998 1053 The problem with this approach is that the "Perform a measurement" 1054 steps themselves take time, so the sampling is not done at times E1, 1055 E1+E2, etc., but rather at E1, E1+M1+E2, etc., where Mi is the amount 1056 of time required for the i'th measurement. If Mi is very small com- 1057 pared to 1/lambda then the potential error introduced by this tech- 1058 nique is likewise small. As Mi becomes a non-negligible fraction of 1059 1/lambda, the potential error increases. 1061 Method 2 attempts to correct this error by taking into account the 1062 amount of time required by the measurements (i.e., the Mi's) and 1063 adjusting the waiting intervals accordingly: 1064 1. Generate E1 and wait that long. 1065 2. Perform a measurement and measure M1, the time it took to do so. 1066 3. Generate E2 and wait for a time E2-M1. 1067 4. Perform a measurement and measure M2 .. 1069 This approach works fine as long as E{i+1} >= Mi. But if E{i+1} < Mi 1070 then it is impossible to wait the proper amount of time. (Note that 1071 this case corresponds to needing to perform two measurements simul- 1072 taneously.) 1074 Method 3 is generating a schedule of measurement times E1, E1+E2, 1075 etc., and then sticking to it: 1076 1. Generate E1, E2, ..., En. 1077 2. Compute measurement times T1, T2, ..., Tn, as Ti = E1 + ... + Ei. 1078 3. Arrange that at times T1, T2, ..., Tn, a measurement is made. 1080 By allowing simultaneous measurements, Method 3 avoids the shortcom- 1081 ings of Methods 1 and 2. If, however, simultaneous measurements 1082 interfere with one another, then Method 3 does not gain any benefit 1083 and may actually prove worse than Methods 1 or 2. 1085 For Internet phenomena, it is not known to what degree the inaccura- 1086 cies of these methods are significant. If the Mi's are much less 1087 than 1/lambda, then any of the three should suffice. If the Mi's are 1088 less than 1/lambda but perhaps not greatly less, then Method 2 is 1089 preferred to Method 1. If simultaneous measurements do not interfere 1090 with one another, then Method 3 is preferred, though it can be con- 1091 siderably harder to implement. 1093 10.2. Self-Consistency 1095 A fundamental requirement for a sound measurement methodology is that 1096 measurement be made using as few unconfirmed assumptions as possible. 1097 Experience has painfully shown how easy it is to make an (often 1098 implicit) assumption that turns out to be incorrect. An example is 1099 incorporating into a measurement the reading of a clock synchronized 1101 ID Framework for IP Performance Metrics February 1998 1103 to a highly accurate source. It is easy to assume that the clock is 1104 therefore accurate; but due to software bugs, a loss of power in the 1105 source, or a loss of communication between the source and the clock, 1106 the clock could actually be quite inaccurate. 1108 This is not to argue that one must not make *any* assumptions when 1109 measuring, but rather that, to the extent which is practical, assump- 1110 tions should be tested. One powerful way for doing so involves 1111 checking for self-consistency. Such checking applies both to the 1112 observed value(s) of the measurement *and the values used by the 1113 measurement process itself*. A simple example of the former is that 1114 when computing a round trip time, one should check to see if it is 1115 negative. Since negative time intervals are non-physical, if it ever 1116 is negative that finding immediately flags an error. *These sorts of 1117 errors should then be investigated!* It is crucial to determine where 1118 the error lies, because only by doing so diligently can we build up 1119 faith in a methodology's fundamental soundness. For example, it 1120 could be that the round trip time is negative because during the 1121 measurement the clock was set backward in the process of synchroniz- 1122 ing it with another source. But it could also be that the measure- 1123 ment program accesses uninitialized memory in one of its computations 1124 and, only very rarely, that leads to a bogus computation. This 1125 second error is more serious, if the same program is used by others 1126 to perform the same measurement, since then they too will suffer from 1127 incorrect results. Furthermore, once uncovered it can be completely 1128 fixed. 1130 A more subtle example of testing for self-consistency comes from 1131 gathering samples of one-way Internet delays. If one has a large 1132 sample of such delays, it may well be highly telling to, for example, 1133 fit a line to the pairs of (time of measurement, measured delay), to 1134 see if the resulting line has a clearly non-zero slope. If so, a 1135 possible interpretation is that one of the clocks used in the meas- 1136 urements is skewed relative to the other. Another interpretation is 1137 that the slope is actually due to genuine network effects. Determin- 1138 ing which is indeed the case will often be highly illuminating. (See 1139 [Pa97] for a discussion of distinguishing between relative clock skew 1140 and genuine network effects.) Furthermore, if making this check is 1141 part of the methodology, then a finding that the long-term slope is 1142 very near zero is positive evidence that the measurements are prob- 1143 ably not biased by a difference in skew. 1145 A final example illustrates checking the measurement process itself 1146 for self-consistency. Above we outline Poisson sampling techniques, 1147 based on generating exponentially-distributed intervals. A sound 1148 measurement methodology would include testing the generated intervals 1149 to see whether they are indeed exponentially distributed (and also to 1150 see if they suffer from correlation). In the appendix we discuss and 1152 ID Framework for IP Performance Metrics February 1998 1154 give C code for one such technique, a general-purpose, well-regarded 1155 goodness-of-fit test called the Anderson-Darling test. 1157 Finally, we note that what is truly relevant for Poisson sampling of 1158 Internet metrics is often not when the measurements began but the 1159 wire times corresponding to the measurement process. These could 1160 well be different, due to complications on the hosts used to perform 1161 the measurement. Thus, even those with complete faith in their 1162 pseudo-random number generators and subsequent algorithms are 1163 encouraged to consider how they might test the assumptions of each 1164 measurement procedure as much as possible. 1166 10.3. Defining Statistical Distributions 1168 One way of describing a collection of measurements (a sample) is as a 1169 statistical distribution -- informally, as percentiles. There are 1170 several slightly different ways of doing so. In this section we 1171 define a standard definition to give uniformity to these descrip- 1172 tions. 1174 The "empirical distribution function" (EDF) of a set of scalar meas- 1175 urements is a function F(x) which for any x gives the fractional pro- 1176 portion of the total measurements that were <= x. If x is less than 1177 the minimum value observed, then F(x) is 0. If it is greater or 1178 equal to the maximum value observed, then F(x) is 1. 1180 For example, given the 6 measurements: 1181 -2, 7, 7, 4, 18, -5 1182 Then F(-8) = 0, F(-5) = 1/6, F(-5.0001) = 0, F(-4.999) = 1/6, F(7) = 1183 5/6, F(18) = 1, F(239) = 1. 1185 Note that we can recover the different measured values and how many 1186 times each occurred from F(x) -- no information regarding the range 1187 in values is lost. Summarizing measurements using histograms, on the 1188 other hand, in general loses information about the different values 1189 observed, so the EDF is preferred. 1191 Using either the EDF or a histogram, however, we do lose information 1192 regarding the order in which the values were observed. Whether this 1193 loss is potentially significant will depend on the metric being meas- 1194 ured. 1196 We will use the term "percentile" to refer to the smallest value of x 1197 for which F(x) >= a given percentage. So the 50th percentile of the 1198 example above is 4, since F(4) = 3/6 = 50%; the 25th percentile is 1199 -2, since F(-5) = 1/6 < 25%, and F(-2) = 2/6 >= 25%; the 100th per- 1200 centile is 18; and the 0th percentile is -infinity, as is the 15th 1202 ID Framework for IP Performance Metrics February 1998 1204 percentile. 1206 Care must be taken when using percentiles to summarize a sample, 1207 because they can lend an unwarranted appearance of more precision 1208 than is really available. Any such summary must include the sample 1209 size N, because any percentile difference finer than 1/N is below the 1210 resolution of the sample. 1212 See [DS86] for more details regarding EDF's. 1214 We close with a note on the common (and important!) notion of median. 1215 In statistics, the median of a distribution is defined to be the 1216 point X for which the probability of observing a value <= X is equal 1217 to the probability of observing a value > X. When estimating the 1218 median of a set of observations, the estimate depends on whether the 1219 number of observations, N, is odd or even: 1220 + If N is odd, then the 50th percentile as defined above is used as 1221 the estimated median. 1222 + If N is even, then the estimated median is the average of the cen- 1223 tral two observations; that is, if the observations are sorted in 1224 ascending order and numbered from 1 to N, where N = 2*K, then the 1225 estimated median is the average of the (K)'th and (K+1)'th obser- 1226 vations. 1227 Usually the term "estimated" is dropped from the phrase "estimated 1228 median" and this value is simply referred to as the "median". 1230 10.4. Testing For Goodness-of-Fit 1232 For some forms of measurement calibration we need to test whether a 1233 set of numbers is consistent with those numbers having been drawn 1234 from a particular distribution. An example is that to apply a self- 1235 consistency check to measurements made using a Poisson process, one 1236 test is to see whether the spacing between the sampling times does 1237 indeed reflect an exponential distribution; or if the dT/N approach 1238 discussed above was used, whether the times are uniformly distributed 1239 across [T, dT]. 1241 {Comment: There are at least three possible sets of values we could 1242 test: the scheduled packet transmission times, as determined by use 1243 of a pseudo-random number generator; user-level timestamps made just 1244 before or after the system call for transmitting the packet; and wire 1245 times for the packets as recorded using a packet filter. All three 1246 of these are potentially informative: failures for the scheduled 1247 times to match an exponential distribution indicate inaccuracies in 1248 the random number generation; failures for the user-level times indi- 1249 cate inaccuracies in the timers used to schedule transmission; and 1250 failures for the wire times indicate inaccuracies in actually 1252 ID Framework for IP Performance Metrics February 1998 1254 transmitting the packets, perhaps due to contention for a shared 1255 resource.} 1257 There are a large number of statistical goodness-of-fit techniques 1258 for performing such tests. See [DS86] for a thorough discussion. 1259 That reference recommends the Anderson-Darling EDF test as being a 1260 good all-purpose test, as well as one that is especially good at 1261 detecting deviations from a given distribution in the lower and upper 1262 tails of the EDF. 1264 It is important to understand that the nature of goodness-of-fit 1265 tests is that one first selects a "significance level", which is the 1266 probability that the test will erroneously declare that the EDF of a 1267 given set of measurements fails to match a particular distribution 1268 when in fact the measurements do indeed reflect that distribution. 1269 Unless otherwise stated, IPPM goodness-of-fit tests are done using 5% 1270 significance. This means that if the test is applied to 100 samples 1271 and 5 of those samples are deemed to have failed the test, then the 1272 samples are all consistent with the distribution being tested. If 1273 significantly more of the samples fail the test, then the assumption 1274 that the samples are consistent with the distribution being tested 1275 must be rejected. If significantly fewer of the samples fail the 1276 test, then the samples have potentially been doctored too well to fit 1277 the distribution. Similarly, some goodness-of-fit tests (including 1278 Anderson-Darling) can detect whether it is likely that a given sample 1279 was doctored. We also use a significance of 5% for this case; that 1280 is, the test will report that a given honest sample is "too good to 1281 be true" 5% of the time, so if the test reports this finding signifi- 1282 cantly more often than one time out of twenty, it is an indication 1283 that something unusual is occurring. 1285 The appendix gives sample C code for implementing the Anderson- 1286 Darling test, as well as further discussing its use. 1288 See [Pa94] for a discussion of goodness-of-fit and closeness-of-fit 1289 tests in the context of network measurement. 1291 11. Avoiding Stochastic Metrics 1293 When defining metrics applying to a path, subpath, cloud, or other 1294 network element, we in general do not define them in stochastic terms 1295 (probabilities). We instead prefer a deterministic definition. So, 1296 for example, rather than defining a metric about a "packet loss pro- 1297 bability between A and B", we would define a metric about a "packet 1298 loss rate between A and B". (A measurement given by the first defin- 1299 ition might be "0.73", and by the second "73 packets out of 100".) 1301 ID Framework for IP Performance Metrics February 1998 1303 We emphasize that the above distinction concerns the *definitions* of 1304 *metrics*. It is not intended to apply to what sort of techniques we 1305 might use to analyze the results of measurements. 1307 The reason for this distinction is as follows. When definitions are 1308 made in terms of probabilities, there are often hidden assumptions in 1309 the definition about a stochastic model of the behavior being meas- 1310 ured. The fundamental goal with avoiding probabilities in our metric 1311 definitions is to avoid biasing our definitions by these hidden 1312 assumptions. 1314 For example, an easy hidden assumption to make is that packet loss in 1315 a network component due to queueing overflows can be described as 1316 something that happens to any given packet with a particular proba- 1317 bility. In today's Internet, however, queueing drops are actually 1318 usually *deterministic*, and assuming that they should be described 1319 probabilistically can obscure crucial correlations between queueing 1320 drops among a set of packets. So it's better to explicitly note sto- 1321 chastic assumptions, rather than have them sneak into our definitions 1322 implicitly. 1324 This does *not* mean that we abandon stochastic models for *under- 1325 standing* network performance! It only means that when defining IP 1326 metrics we avoid terms such as "probability" for terms like "propor- 1327 tion" or "rate". We will still use, for example, random sampling in 1328 order to estimate probabilities used by stochastic models related to 1329 the IP metrics. We also do not rule out the possibility of stochas- 1330 tic metrics when they are truly appropriate (for example, perhaps to 1331 model transmission errors caused by certain types of line noise). 1333 12. Packets of Type P 1335 A fundamental property of many Internet metrics is that the value of 1336 the metric depends on the type of IP packet(s) used to make the meas- 1337 urement. Consider an IP-connectivity metric: one obtains different 1338 results depending on whether one is interested in connectivity for 1339 packets destined for well-known TCP ports or unreserved UDP ports, or 1340 those with invalid IP checksums, or those with TTL's of 16, for exam- 1341 ple. In some circumstances these distinctions will be highly 1342 interesting (for example, in the presence of firewalls, or RSVP 1343 reservations). 1345 Because of this distinction, we introduce the generic notion of a 1346 "packet of type P", where in some contexts P will be explicitly 1347 defined (i.e., exactly what type of packet we mean), partially 1348 defined (e.g., "with a payload of B octets"), or left generic. Thus 1349 we may talk about generic IP-type-P-connectivity or more specific 1351 ID Framework for IP Performance Metrics February 1998 1353 IP-port-HTTP-connectivity. Some metrics and methodologies may be 1354 fruitfully defined using generic type P definitions which are then 1355 made specific when performing actual measurements. 1357 Whenever a metric's value depends on the type of the packets involved 1358 in the metric, the metric's name will include either a specific type 1359 or a phrase such as "type-P". Thus we will not define an "IP- 1360 connectivity" metric but instead an "IP-type-P-connectivity" metric 1361 and/or perhaps an "IP-port-HTTP-connectivity" metric. This naming 1362 convention serves as an important reminder that one must be conscious 1363 of the exact type of traffic being measured. 1365 A closely related note: it would be very useful to know if a given 1366 Internet component treats equally a class C of different types of 1367 packets. If so, then any one of those types of packets can be used 1368 for subsequent measurement of the component. This suggests we devise 1369 a metric or suite of metrics that attempt to determine C. 1371 13. Internet Addresses vs. Hosts 1373 When considering a metric for some path through the Internet, it is 1374 often natural to think about it as being for the path from Internet 1375 host H1 to host H2. A definition in these terms, though, can be 1376 ambiguous, because Internet hosts can be attached to more than one 1377 network. In this case, the result of the metric will depend on which 1378 of these networks is actually used. 1380 Because of this ambiguitiy, usually such definitions should instead 1381 be defined in terms of Internet IP addresses. For the common case of 1382 a unidirectional path through the Internet, we will use the term 1383 "Src" to denote the IP address of the beginning of the path, and 1384 "Dst" to denote the IP address of the end. 1386 14. Standard-Formed Packets 1388 Unless otherwise stated, all metric definitions that concern IP pack- 1389 ets include an implicit assumption that the packet is *standard 1390 formed*. A packet is standard formed if it meets all of the follow- 1391 ing criteria: 1392 + Its length as given in the IP header corresponds to the size of 1393 the IP header plus the size of the payload. 1395 ID Framework for IP Performance Metrics February 1998 1397 + It includes a valid IP header: the version field is 4 (later, we 1398 will expand this to include 6); the header length is >= 5; the 1399 checksum is correct. 1400 + It is not an IP fragment. 1401 + The source and destination addresses correspond to the hosts in 1402 question. 1403 + Either the packet possesses sufficient TTL to travel from the 1404 source to the destination if the TTL is decremented by one at each 1405 hop, or it possesses the maximum TTL of 255. 1406 + It does not contain IP options unless explicitly noted. 1407 + If a transport header is present, it too contains a valid checksum 1408 and other valid fields. 1409 We further require that if a packet is described as having a "length 1410 of B octets", then 0 <= B <= 65535; and if B is the payload length in 1411 octets, then B <= (65535-IP header size in octets). 1413 So, for example, one might imagine defining an IP connectivity metric 1414 as "IP-type-P-connectivity for standard-formed packets with the IP 1415 TOS field set to 0", or, more succinctly, "IP-type-P-connectivity 1416 with the IP TOS field set to 0", since standard-formed is already 1417 implied by convention. 1419 A particular type of standard-formed packet often useful to consider 1420 is the "minimal IP packet from A to B" - this is an IP packet with 1421 the following properties: 1422 + It is standard-formed. 1423 + Its data payload is 0 octets. 1424 + It contains no options. 1425 (Note that we do not define its protocol field, as different values 1426 may lead to different treatment by the network.) 1428 When defining IP metrics we keep in mind that no packet smaller or 1429 simpler than this can be transmitted over a correctly operating IP 1430 network. 1432 15. Acknowledgements 1434 The comments of Brian Carpenter, Bill Cerveny, Padma Krishnaswamy 1435 Jeff Sedayao and Howard Stanislevic are appreciated. 1437 ID Framework for IP Performance Metrics February 1998 1439 16. Security Considerations 1441 This document concerns definitions and concepts related to Internet 1442 measurement. We discuss measurement procedures only in high-level 1443 terms, regarding principles that lend themselves to sound measure- 1444 ment. As such, the topics discussed do not affect the security of 1445 the Internet or of applications which run on it. 1447 That said, it should be recognized that conducting Internet measure- 1448 ments can raise both security and privacy concerns. Active tech- 1449 niques, in which traffic is injected into the network, can be abused 1450 for denial-of-service attacks disguised as legitimate measurement 1451 activity. Passive techniques, in which existing traffic is recorded 1452 and analyzed, can expose the contents of Internet traffic to unin- 1453 tended recipients. Consequently, the definition of each metric and 1454 methodology must include a corresponding discussion of security con- 1455 siderations. 1457 17. Appendix 1459 Below we give routines written in C for computing the Anderson- 1460 Darling test statistic (A2) for determining whether a set of values 1461 is consistent with a given statistical distribution. Externally, the 1462 two main routines of interest are: 1463 double exp_A2_known_mean(double x[], int n, double mean) 1464 double unif_A2_known_range(double x[], int n, 1465 double min_val, double max_val) 1466 Both take as their first argument, x, the array of n values to be 1467 tested. (Upon return, the elements of x are sorted.) The remaining 1468 parameters characterize the distribution to be used: either the mean 1469 (1/lambda), for an exponential distribution, or the lower and upper 1470 bounds, for a uniform distribution. The names of the routines stress 1471 that these values must be known in advance, and *not* estimated from 1472 the data (for example, by computing its sample mean). Estimating the 1473 parameters from the data *changes* the significance level of the test 1474 statistic. While [DS86] gives alternate significance tables for some 1475 instances in which the parameters are estimated from the data, for 1476 our purposes we expect that we should indeed know the parameters in 1477 advance, since what we will be testing are generally values such as 1478 packet sending times that we wish to verify follow a known distribu- 1479 tion. 1481 Both routines return a significance level, as described earlier. This 1482 is a value between 0 and 1. The correct use of the routines is to 1483 pick in advance the threshold for the significance level to test; 1484 generally, this will be 0.05, corresponding to 5%, as also described 1485 above. Subsequently, if the routines return a value strictly less 1487 ID Framework for IP Performance Metrics February 1998 1489 than this threshold, then the data are deemed to be inconsistent with 1490 the presumed distribution, *subject to an error corresponding to the 1491 significance level*. That is, for a significance level of 5%, 5% of 1492 the time data that is indeed drawn from the presumed distribution 1493 will be erroneously deemed inconsistent. 1495 Thus, it is important to bear in mind that if these routines are used 1496 frequently, then one will indeed encounter occasional failures, even 1497 if the data is unblemished. 1499 Another important point concerning significance levels is that it is 1500 unsound to compare them in order to determine which of two sets of 1501 values is a "better" fit to a presumed distribution. Such testing 1502 should instead be done using "closeness-of-fit metrics" such as the 1503 lambda^2 metric described in [Pa94]. 1505 While the routines provided are for exponential and uniform distribu- 1506 tions with known parameters, it is generally straight-forward to 1507 write comparable routines for any distribution with known parameters. 1508 The heart of the A2 tests lies in a statistic computed for testing 1509 whether a set of values is consistent with a uniform distribution 1510 between 0 and 1, which we term Unif(0, 1). If we wish to test 1511 whether a set of values, X, is consistent with a given distribution 1512 G(x), we first compute 1513 Y = G_inverse(X) 1514 If X is indeed distributed according to G(x), then Y will be distri- 1515 buted according to Unif(0, 1); so by testing Y for consistency with 1516 Unif(0, 1), we also test X for consistency with G(x). 1518 We note, however, that the process of computing Y above might yield 1519 values of Y outside the range (0..1). Such values should not occur 1520 if X is indeed distributed according to G(x), but easily can occur if 1521 it is not. In the latter case, we need to avoid computing the cen- 1522 tral A2 statistic, since floating-point exceptions may occur if any 1523 of the values lie outside (0..1). Accordingly, the routines check 1524 for this possiblity, and if encountered, return a raw A2 statistic of 1525 -1. The routine that converts the raw A2 statistic to a significance 1526 level likewise propagates this value, returning a significance level 1527 of -1. So, any use of these routines must be prepared for a possible 1528 negative significance level. 1530 The last important point regarding use of A2 statistic concerns n, 1531 the number of values being tested. If n < 5 then the test is not 1532 meaningful, and in this case a significance level of -1 is returned. 1534 On the other hand, for "real" data the test *gains* power as n 1535 becomes larger. It is well known in the statistics community that 1536 real data almost never exactly matches a theoretical distribution, 1538 ID Framework for IP Performance Metrics February 1998 1540 even in cases such as rolling dice a great many times (see [Pa94] for 1541 a brief discussion and references). The A2 test is sensitive enough 1542 that, for sufficiently large sets of real data, the test will almost 1543 always fail, because it will manage to detect slight imperfections in 1544 the fit of the data to the distribution. 1546 For example, we have found that when testing 8,192 measured wire 1547 times for packets sent at Poisson intervals, the measurements almost 1548 always fail the A2 test. On the other hand, testing 128 measurements 1549 failed at 5% significance only about 5% of the time, as expected. 1550 Thus, in general, when the test fails, care must be taken to under- 1551 stand why it failed. 1553 The remainder of this appendix gives C code for the routines men- 1554 tioned above. 1556 /* Routines for computing the Anderson-Darling A2 test statistic. 1557 * 1558 * Implememented based on the description in "Goodness-of-Fit 1559 * Techniques," R. D'Agostino and M. Stephens, editors, 1560 * Marcel Dekker, Inc., 1986. 1561 */ 1563 #include 1564 #include 1565 #include 1567 /* Returns the raw A^2 test statistic for n sorted samples 1568 * z[0] .. z[n-1], for z ~ Unif(0,1). 1569 */ 1570 extern double compute_A2(double z[], int n); 1572 /* Returns the significance level associated with a A^2 test 1573 * statistic value of A2, assuming no parameters of the tested 1574 * distribution were estimated from the data. 1575 */ 1576 extern double A2_significance(double A2); 1578 /* Returns the A^2 significance level for testing n observations 1579 * x[0] .. x[n-1] against an exponential distribution with the 1580 * given mean. 1581 * 1582 * SIDE EFFECT: the x[0..n-1] are sorted upon return. 1583 */ 1584 extern double exp_A2_known_mean(double x[], int n, double mean); 1586 /* Returns the A^2 significance level for testing n observations 1587 * x[0] .. x[n-1] against the uniform distribution [min_val, max_val]. 1589 ID Framework for IP Performance Metrics February 1998 1591 * 1592 * SIDE EFFECT: the x[0..n-1] are sorted upon return. 1593 */ 1594 extern double unif_A2_known_range(double x[], int n, 1595 double min_val, double max_val); 1597 /* Returns a pseudo-random number distributed according to an 1598 * exponential distribution with the given mean. 1599 */ 1600 extern double random_exponential(double mean); 1602 /* Helper function used by qsort() to sort double-precision 1603 * floating-point values. 1604 */ 1605 static int 1606 compare_double(const void *v1, const void *v2) 1607 { 1608 double d1 = *(double *) v1; 1609 double d2 = *(double *) v2; 1611 if (d1 < d2) 1612 return -1; 1613 else if (d1 > d2) 1614 return 1; 1615 else 1616 return 0; 1617 } 1619 double 1620 compute_A2(double z[], int n) 1621 { 1622 int i; 1623 double sum = 0.0; 1625 if ( n < 5 ) 1626 /* Too few values. */ 1627 return -1.0; 1629 /* If any of the values are outside the range (0, 1) then 1630 * fail immediately (and avoid a possible floating point 1631 * exception in the code below). 1632 */ 1633 for (i = 0; i < n; ++i) 1634 if ( z[i] <= 0.0 || z[i] >= 1.0 ) 1635 return -1.0; 1637 /* Page 101 of D'Agostino and Stephens. */ 1639 ID Framework for IP Performance Metrics February 1998 1641 for (i = 1; i <= n; ++i) { 1642 sum += (2 * i - 1) * log(z[i-1]); 1643 sum += (2 * n + 1 - 2 * i) * log(1.0 - z[i-1]); 1644 } 1645 return -n - (1.0 / n) * sum; 1646 } 1648 double 1649 A2_significance(double A2) 1650 { 1651 /* Page 105 of D'Agostino and Stephens. */ 1652 if (A2 < 0.0) 1653 return A2; /* Bogus A2 value - propagate it. */ 1655 /* Check for possibly doctored values. */ 1656 if (A2 <= 0.201) 1657 return 0.99; 1658 else if (A2 <= 0.240) 1659 return 0.975; 1660 else if (A2 <= 0.283) 1661 return 0.95; 1662 else if (A2 <= 0.346) 1663 return 0.90; 1664 else if (A2 <= 0.399) 1665 return 0.85; 1667 /* Now check for possible inconsistency. */ 1668 if (A2 <= 1.248) 1669 return 0.25; 1670 else if (A2 <= 1.610) 1671 return 0.15; 1672 else if (A2 <= 1.933) 1673 return 0.10; 1674 else if (A2 <= 2.492) 1675 return 0.05; 1676 else if (A2 <= 3.070) 1677 return 0.025; 1678 else if (A2 <= 3.880) 1679 return 0.01; 1680 else if (A2 <= 4.500) 1681 return 0.005; 1682 else if (A2 <= 6.000) 1683 return 0.001; 1684 else 1685 return 0.0; 1686 } 1688 double 1690 ID Framework for IP Performance Metrics February 1998 1692 exp_A2_known_mean(double x[], int n, double mean) 1693 { 1694 int i; 1695 double A2; 1697 /* Sort the first n values. */ 1698 qsort(x, n, sizeof(x[0]), compare_double); 1700 /* Assuming they match an exponential distribution, transform 1701 * them to Unif(0,1). 1702 */ 1703 for (i = 0; i < n; ++i) { 1704 x[i] = 1.0 - exp(-x[i] / mean); 1705 } 1707 /* Now make the A^2 test to see if they're truly uniform. */ 1708 A2 = compute_A2(x, n); 1709 return A2_significance(A2); 1710 } 1712 double 1713 unif_A2_known_range(double x[], int n, double min_val, double max_val) 1714 { 1715 int i; 1716 double A2; 1717 double range = max_val - min_val; 1719 /* Sort the first n values. */ 1720 qsort(x, n, sizeof(x[0]), compare_double); 1722 /* Transform Unif(min_val, max_val) to Unif(0,1). */ 1723 for (i = 0; i < n; ++i) 1724 x[i] = (x[i] - min_val) / range; 1726 /* Now make the A^2 test to see if they're truly uniform. */ 1727 A2 = compute_A2(x, n); 1728 return A2_significance(A2); 1729 } 1731 double 1732 random_exponential(double mean) 1733 { 1734 return -mean * log1p(-drand48()); 1735 } 1737 ID Framework for IP Performance Metrics February 1998 1739 18. References 1741 [AK96] G. Almes and S. Kalidindi, "A One-way Delay Metric for IPPM", 1742 work in progress, November 1997. 1744 [BM92] I. Bilinskis and A. Mikelsons, Randomized Signal Processing, 1745 Prentice Hall International, 1992. 1747 [DS86] R. D'Agostino and M. Stephens, editors, Goodness-of-Fit Tech- 1748 niques, Marcel Dekker, Inc., 1986. 1750 [CPB93] K. Claffy, G. Polyzos, and H-W. Braun, ``Application of Sam- 1751 pling Methodologies to Network Traffic Characterization,'' Proc. 1752 SIGCOMM '93, pp. 194-203, San Francisco, September 1993. 1754 [FJ94] S. Floyd and V. Jacobson, ``The Synchronization of Periodic 1755 Routing Messages,'' IEEE/ACM Transactions on Networking, 2(2), pp. 1756 122-136, April 1994. 1758 [Mi92] D. Mills, "Network Time Protocol (Version 3) Specification, 1759 Implementation and Analysis", RFC 1305, March 1992. 1761 [Pa94] V. Paxson, ``Empirically-Derived Analytic Models of Wide-Area 1762 TCP Connections,'' IEEE/ACM Transactions on Networking, 2(4), pp. 1763 316-336, August 1994. 1765 [Pa96] V. Paxson, ``Towards a Framework for Defining Internet Perfor- 1766 mance Metrics,'' Proceedings of INET '96, 1767 ftp://ftp.ee.lbl.gov/papers/metrics-framework-INET96.ps.Z 1769 [Pa97] V. Paxson, ``Measurements and Analysis of End-to-End Internet 1770 Dynamics,'' Ph.D. dissertation, U.C. Berkeley, 1997, 1771 ftp://ftp.ee.lbl.gov/papers/vp-thesis/dis.ps.gz. 1773 19. Authors' Addresses 1775 Vern Paxson 1776 MS 50B/2239 1777 Lawrence Berkeley National Laboratory 1778 University of California 1779 Berkeley, CA 94720 1780 USA 1781 Phone: +1 510/486-7504 1783 Guy Almes 1784 Advanced Network & Services, Inc. 1785 200 Business Park Drive 1787 ID Framework for IP Performance Metrics February 1998 1789 Armonk, NY 10504 1790 USA 1791 Phone: +1 914/273-7863 1793 Jamshid Mahdavi 1794 Pittsburgh Supercomputing Center 1795 4400 5th Avenue 1796 Pittsburgh, PA 15213 1797 USA 1798 Phone: +1 412/268-6282 1800 Matt Mathis 1801 Pittsburgh Supercomputing Center 1802 4400 5th Avenue 1803 Pittsburgh, PA 15213 1804 USA 1805 Phone: +1 412/268-3319