idnits 2.17.1 draft-ietf-ippm-framework-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 35 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 36 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 673 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? ** There are 2 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 1170: '...Any such summary MUST include the sa...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 13 has weird spacing: '... Drafts are ...' == Line 14 has weird spacing: '...cuments of t...' == Line 15 has weird spacing: '...ups may also ...' == Line 23 has weird spacing: '... please check...' == Line 25 has weird spacing: '...ctories on f...' == (668 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 1997) is 9657 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 1662 == Unused Reference: 'Pa96' is defined on line 1707, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AK96' -- Possible downref: Non-RFC (?) normative reference: ref. 'BM92' -- Possible downref: Non-RFC (?) normative reference: ref. 'DS86' -- Possible downref: Non-RFC (?) normative reference: ref. 'CPB93' -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ94' ** Obsolete normative reference: RFC 1305 (ref. 'Mi92') (Obsoleted by RFC 5905) -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa96' -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa97' Summary: 14 errors (**), 0 flaws (~~), 10 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group V. Paxson, Lawrence Berkeley National Lab 3 Internet Draft G. Almes, Advanced Network & Services 4 J. Mahdavi, Pittsburgh Supercomputer Center 5 M. Mathis, Pittsburgh Supercomputer Center 6 Expiration Date: May 1998 November 1997 8 Framework for IP Performance Metrics 9 11 1. Status of this Memo 13 This document is an Internet Draft. Internet Drafts are working 14 documents of the Internet Engineering Task Force (IETF), its areas, 15 and its working groups. Note that other groups may also distribute 16 working documents as Internet Drafts. 18 Internet Drafts are draft documents valid for a maximum of six 19 months, and may be updated, replaced, or obsoleted by other documents 20 at any time. It is inappropriate to use Internet Drafts as reference 21 material or to cite them other than as ``work in progress''. 23 To learn the current status of any Internet Draft, please check the 24 ``1id-abstracts.txt'' listing contained in the Internet Drafts shadow 25 directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 26 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 27 ftp.isi.edu (US West Coast). 29 This memo provides information for the Internet community. This memo 30 does not specify an Internet standard of any kind. Distribution of 31 this memo is unlimited. 33 2. Introduction 35 The purpose of this memo is to define a general framework for 36 particular metrics to be developed by the IETF's IP Performance 37 Metrics effort, begun by the Benchmarking Methodology Working Group 38 (BMWG) of the Operational Requirements Area, and being continued by 39 the IP Performance Metrics Working Group (IPPM) of the Transport 40 Area. 42 We begin by laying out several criteria for the metrics that we 43 adopt. These criteria are designed to promote an IPPM effort that 44 will maximize an accurate common understanding by Internet users and 45 Internet providers of the performance and reliability both of end- 46 to-end paths through the Internet and of specific 'IP clouds' that 48 ID Framework for IP Performance Metrics November 1997 50 comprise portions of those paths. 52 We next define some Internet vocabulary that will allow us to speak 53 clearly about Internet components such as routers, paths, and clouds. 55 We then define the fundamental concepts of 'metric' and 'measurement 56 methodology', which allow us to speak clearly about measurement 57 issues. Given these concepts, we proceed to discuss the important 58 issue of measurement uncertainties and errors, and develop a key, 59 somewhat subtle notion of how they relate to the analytical framework 60 shared by many aspects of the Internet engineering discipline. We 61 then introduce the notion of empirically defined metrics, and finish 62 this part of the document with a general discussion of how metrics 63 can be 'composed'. 65 The remainder of the document deals with a variety of issues related 66 to defining sound metrics and methodologies: how to deal with 67 imperfect clocks; the notion of 'wire time' as distinct from 'host 68 time'; how to aggregate sets of singleton metrics into samples and 69 derive sound statistics from those samples; why it is recommended to 70 avoid thinking about Internet properties in probabilistic terms (such 71 as the probability that a packet is dropped), since these terms often 72 include implicit assumptions about how the network behaves; the 73 utility of defining metrics in terms of packets of a generic type; 74 the benefits of preferring IP addresses to DNS host names; and the 75 notion of 'standard-formed' packets. An appendix discusses the 76 Anderson-Darling test for gauging whether a set of values matches a 77 given statistical distribution, and gives C code for an 78 implementation of the test. 80 In some sections of the memo, we will surround some commentary text 81 with the brackets {Comment: ... }. We stress that this commentary is 82 only commentary, and is not itself part of the framework document or 83 a proposal of particular metrics. In some cases this commentary will 84 discuss some of the properties of metrics that might be envisioned, 85 but the reader should assume that any such discussion is intended 86 only to shed light on points made in the framework document, and not 87 to suggest any specific metrics. 89 ID Framework for IP Performance Metrics November 1997 91 3. Criteria for IP Performance Metrics 93 The overarching goal of the IP Performance Metrics effort is to 94 achieve a situation in which users and providers of Internet 95 transport service have an accurate common understanding of the 96 performance and reliability of the Internet component 'clouds' that 97 they use/provide. 99 To achieve this, performance and reliability metrics for paths 100 through the Internet must be developed. In several IETF meetings 101 criteria for these metrics have been specified: 102 + The metrics must be concrete and well-defined, 103 + A methodology for a metric should have the property that it is 104 repeatable: if the methodology is used multiple times under 105 identical conditions, the same measurements should result in the 106 same measurements. 107 + The metrics must exhibit no bias for IP clouds implemented with 108 identical technology, 109 + The metrics must exhibit understood and fair bias for IP clouds 110 implemented with non-identical technology, 111 + The metrics must be useful to users and providers in understanding 112 the performance they experience or provide, 113 + The metrics must avoid inducing artificial performance goals. 115 4. Terminology for Paths and Clouds 117 The following list defines terms that need to be precise in the 118 development of path metrics. We begin with low-level notions of a 119 path into relevant pieces. 121 host A computer capable of communicating using the Internet proto- 122 cols; includes "routers". 124 link A single link-level connection between two (or more) hosts; 125 includes leased lines, ethernets, frame relay clouds, etc. 127 routerA host which facilitates network-level communication between 128 hosts by forwarding IP packets. 130 path A sequence of the form < h0, l1, h1, ..., ln, hn >, where n >= 131 0, each hi is a host, each li is a link between hi-1 and hi, 132 each h1...hn-1 is a router. A pair is termed a 'hop'. 133 In an appropriate operational configuration, the links and 134 routers in the path facilitate network-layer communication of 135 packets from h0 to hn. Note that path is a unidirectional con- 136 cept. 138 ID Framework for IP Performance Metrics November 1997 140 subpath 141 Given a path, a subpath is any subsequence of the given path 142 which is itself a path. (Thus, the first and last element of a 143 subpath is a host.) 145 cloudAn undirected (possibly cyclic) graph whose vertices are routers 146 and whose edges are links that connect pairs of routers. For- 147 mally, ethernets, frame relay clouds, and other links that con- 148 nect more than two routers are modelled as fully-connected 149 meshes of graph edges. Note that to connect to a cloud means to 150 connect to a router of the cloud over a link; this link is not 151 itself part of the cloud. 153 exchange 154 A special case of a link, an exchange directly connects either a 155 host to a cloud and/or one cloud to another cloud. 157 cloud subpath 158 A subpath of a given path, all of whose hosts are routers of a 159 given cloud. 161 path digest 162 A sequence of the form < h0, e1, C1, ..., en, hn >, where n >= 163 0, h0 and hn are hosts, each e1 ... en is an exchange, and each 164 C1 ... Cn-1 is a cloud subpath. 166 5. Fundamental Concepts 168 5.1. Metrics 170 In the operational Internet, there are several quantities related to 171 the performance and reliability of the Internet that we'd like to 172 know the value of. When such a quantity is carefully specified, we 173 term the quantity a metric. We anticipate that there will be 174 separate RFCs for each metric (or for each closely related group of 175 metrics). 177 In some cases, there might be no obvious means to effectively measure 178 the metric; this is allowed, and even understood to be very useful in 179 some cases. It is required, however, that the specification of the 180 metric be as clear as possible about what quantity is being speci- 181 fied. Thus, difficulty in practical measurement is sometimes 182 allowed, but ambiguity in meaning is not. 184 Each metric will be defined in terms of standard units of measure- 185 ment. The international metric system will be used, with the 187 ID Framework for IP Performance Metrics November 1997 189 following points specifically noted: 190 + When a unit is expressed in simple meters (for distance/length) or 191 seconds (for duration), appropriate related units based on 192 thousands or thousandths of acceptable units are acceptable. 193 Thus, distances expressed in kilometers (km), durations expressed 194 in milliseconds (ms), or microseconds (us) are allowed, but not 195 centimeters (because the prefix is not in terms of thousands or 196 thousandths). 197 + When a unit is expressed in a combination of units, appropriate 198 related units based on thousands or thousandths of acceptable 199 units are acceptable, but all such thousands/thousandths must be 200 grouped at the beginning. Thus, kilo-meters per second (km/s) is 201 allowed, but meters per millisecond is not. 202 + The unit of information is the bit. 203 + When metric prefixes are used with bits or with combinations 204 including bits, those prefixes will have their metric meaning 205 (related to decimal 1000), and not the meaning conventional with 206 computer storage (related to decimal 1024). In any RFC that 207 defines a metric whose units include bits, this convention will be 208 followed and will be repeated to ensure clarity for the reader. 209 + When a time is given, it will be expressed in UTC. 210 Note that these points apply to the specifications for metrics and 211 not, for example, to packet formats where octets will likely be used 212 in preference/addition to bits. 214 Finally, we note that some metrics may be defined purely in terms of 215 other metrics; such metrics are call 'derived metrics'. 217 5.2. Measurement Methodology 219 For a given set of well-defined metrics, a number of distinct meas- 220 urement methodologies may exist. A partial list includes: 221 + Direct measurement of a performance metric using injected test 222 traffic. Example: measurement of the round-trip delay of an IP 223 packet of a given size over a given route at a given time. 224 + Projection of a metric from lower-level measurements. Example: 225 given accurate measurements of propagation delay and bandwidth for 226 each step along a path, projection of the complete delay for the 227 path for an IP packet of a given size. 228 + Estimation of a consituent metric from a set of more aggregated 229 measurements. Example: given accurate measurements of delay for a 230 given one-hop path for IP packets of different sizes, estimation 231 of propagation delay for the link of that one-hop path. 233 ID Framework for IP Performance Metrics November 1997 235 + Estimation of a given metric at one time from a set of related 236 metrics at other times. Example: given an accurate measurement of 237 flow capacity at a past time, together with a set of accurate 238 delay measurements for that past time and the current time, and 239 given a model of flow dynamics, estimate the flow capacity that 240 would be observed at the current time. 241 This list is by no means exhaustive. The purpose is to point out the 242 variety of measurement techniques. 244 When a given metric is specified, a given measurement approach might 245 be noted and discussed. That approach, however, is not formally part 246 of the specification. 248 A methodology for a metric should have the property that it is 249 repeatable: if the methodology is used multiple times under identical 250 conditions, it should result in consistent measurements. 252 Backing off a little from the word 'identical' in the previous para- 253 graph, we could more accurately use the word 'continuity' to describe 254 a property of a given methodology: a methodology for a given metric 255 exhibits continuity if, for small variations in conditions, it 256 results in small variations in the resulting measurements. Slightly 257 more precisely, for every positive epsilon, there exists a positive 258 delta, such that if two sets of conditions are within delta of each 259 other, then the resulting measurements will be within epsilon of each 260 other. At this point, this should be taken as a heuristic driving 261 our intuition about one kind of robustness property rather than as a 262 precise notion. 264 A metric that has at least one methodology that exhibits continuity 265 is said itself to exhibit continuity. 267 Note that some metrics, such as hop-count along a path, are integer- 268 valued and therefore cannot exhibit continuity in quite the sense 269 given above. 271 Note further that, in practice, it may not be practical to know (or 272 be able to quantify) the conditions relevant to a measurement at a 273 given time. For example, since the instantaneous load (in packets to 274 be served) at a given router in a high-speed wide-area network can 275 vary widely over relatively brief periods and will be very hard for 276 an external observer to quantify, various statistics of a given 277 metric may be more repeatable, or may better exhibit continuity. In 278 that case those particular statistics should be specified when the 279 metric is specified. 281 Finally, some measurement methodologies may be 'conservative' in the 282 sense that the act of measurement does not modify, or only slightly 284 ID Framework for IP Performance Metrics November 1997 286 modifies, the value of the performance metric the methodology 287 attempts to measure. {Comment: for example, in a wide-are high-speed 288 network under modest load, a test using several small 'ping' packets 289 to measure delay would likely not interfere (much) with the delay 290 properties of that network as observed by others. The corresponding 291 statement about tests using a large flow to measure flow capacity 292 would likely fail.} 294 5.3. Measurements, Uncertainties, and Errors 296 Even the very best measurement methodologies for the very most well 297 behaved metrics will exhibit errors. Those who develop such measure- 298 ment methodologies, however, should strive to: 299 + minimize their uncertainties/errors, 300 + understand and document the sources of uncertainty/error, and 301 + quantify the amounts of uncertainty/error. 303 For example, when developing a method for measuring delay, understand 304 how any errors in your clocks introduce errors into your delay meas- 305 urement, and quantify this effect as well as you can. In some cases, 306 this will result in a requirement that a clock be at least up to a 307 certain quality if it is to be used to make a certain measurement. 309 As a second example, consider the timing error due to measurement 310 overheads within the computer making the measurement, as opposed to 311 delays due to the Internet component being measured. The former is a 312 measurement error, while the latter reflects the metric of interest. 313 Note that one technique that can help avoid this overhead is the use 314 of a packet filter/sniffer, running on a separate computer that 315 records network packets and timestamps them accurately (see the dis- 316 cussion of 'wire time' below). The resulting trace can then be 317 analysed to assess the test traffic, minimising the effect of meas- 318 urement host delays, or at least allowing those delays to be 319 accounted for. We note that this technique may prove beneficial even 320 if the packet filter/sniffer runs on the same machine, because such 321 measurements generally provide 'kernel-level' timestamping as opposed 322 to less-accurate 'application-level' timestamping. 324 Finally, we note that derived metrics (defined above) or metrics that 325 exhibit spatial or temporal composition (defined below) offer partic- 326 ular occasion for the analysis of measurement uncertainties, namely 327 how the uncertainties propagate (conceptually) due to the derivation 328 or composition. 330 ID Framework for IP Performance Metrics November 1997 332 6. Metrics and the Analytical Framework 334 As the Internet has evolved from the early packet-switching studies 335 of the 1960s, the Internet engineering community has evolved a common 336 analytical framework of concepts. This analytical framework, or A- 337 frame, used by designers and implementers of protocols, by those 338 involved in measurement, and by those who study computer network per- 339 formance using the tools of simulation and analysis, has great advan- 340 tage to our work. A major objective here is to generate network 341 characterizations that are consistent in both analytical and practi- 342 cal settings, since this will maximize the chances that non-empirical 343 network study can be better correlated with, and used to further our 344 understanding of, real network behavior. 346 Whenever possible, therefore, we would like to develop and leverage 347 off of the A-frame. Thus, whenever a metric to be specified is 348 understood to be closely related to concepts within the A-frame, we 349 will attempt to specify the metric in the A-frame's terms. In such a 350 specification we will develop the A-frame by precisely defining the 351 concepts needed for the metric, then leverage off of the A-frame by 352 defining the metric in terms of those concepts. 354 Such a metric will be called an 'analytically specified metric' or, 355 more simply, an analytical metric. 357 {Comment: Examples of such analytical metrics might include: 359 propagation time of a link 360 The time, in seconds, required by a single bit to travel from the 361 output port on one Internet host across a single link to another 362 Internet host. 364 bandwidth of a link for packets of size k 365 The capacity, in bits/second, where only those bits of the IP 366 packet are counted, for packets of size k bytes. 368 routeThe path, as defined in Section 4, from A to B at a given time. 370 hop count of a route 371 The value 'n' of the route path. 372 } 374 Note that we make no a priori list of just what A-frame concepts 375 will emerge in these specifications, but we do encourage their use 376 and urge that they be carefully specified so that, as our set of 377 metrics develops, so will a specified set of A-frame concepts 378 technically consistent with each other and consonent with the com- 379 mon understanding of those concepts within the general Internet 381 ID Framework for IP Performance Metrics November 1997 383 community. 385 These A-frame concepts will be intended to abstract from actual 386 Internet components in such a way that: 387 + the essential function of the component is retained, 388 + properties of the component relevant to the metrics we aim to 389 create are retained, 390 + a subset of these component properties are potentially defined as 391 analytical metrics, and 392 + those properties of actual Internet components not relevant to 393 defining the metrics we aim to create are dropped. 395 For example, when considering a router in the context of packet for- 396 warding, we might model the router as a component that receives pack- 397 ets on an input link, queues them on a FIFO packet queue of finite 398 size, employs tail-drop when the packet queue is full, and forwards 399 them on an output link. The transmission speed (in bits/second) of 400 the input and output links, the latency in the router (in seconds), 401 and the maximum size of the packet queue (in bits) are relevant 402 analytical metrics. 404 In some cases, such analytical metrics used in relation to a router 405 will be very closely related to specific metrics of the performance 406 of Internet paths. For example, an obvious formula (L + P/B) involv- 407 ing the latency in the router (L), the packet size (in bits) (P), and 408 the transmission speed of the output link (B) might closely approxi- 409 mate the increase in packet delay due to the insertion of a given 410 router along a path. 412 We stress, however, that well-chosen and well-specified A-frame con- 413 cepts and their analytical metrics will support more general metric 414 creation efforts in less obvious ways. 416 {Comment: for example, when considering the flow capacity of a path, 417 it may be of real value to be able to model each of the routers along 418 the path as packet forwarders as above. Techniques for estimating 419 the flow capacity of a path might use the maximum packet queue size 420 as a parameter in decidedly non-obvious ways. For example, as the 421 maximum queue size increases, so will the ability of the router to 422 continuously move traffic along an output link despite fluctuations 423 in traffic from an input link. Estimating this increase, however, 424 remains a research topic.} 426 Note that, when we specify A-frame concepts and analytical metrics, 427 we will inevitably make simplifying assumptions. The key role of 428 these concepts is to abstract the properties of the Internet com- 429 ponents relevant to given metrics. Judgement is required to avoid 430 making assumptions that bias the modeling and metric effort toward 432 ID Framework for IP Performance Metrics November 1997 434 one kind of design. 436 {Comment: for example, routers might not use tail-drop, even though 437 tail-drop might be easier to model analytically.} 439 Finally, note that different elements of the A-frame might well make 440 different simplifying assumptions. For example, the abstraction of a 441 router used to further the definition of path delay might treat the 442 router's packet queue as a single FIFO queue, but the abstraction of 443 a router used to further the definition of the handling of an RSVP- 444 enabled packet might treat the router's packet queue as supporting 445 bounded delay -- a contradictory assumption. This is not to say that 446 we make contradictory assumptions at the same time, but that two dif- 447 ferent parts of our work might refine the simpler base concept in two 448 divergent ways for different purposes. 450 {Comment: in more mathematical terms, we would say that the A-frame 451 taken as a whole need not be consistent; but the set of particular 452 A-frame elements used to define a particular metric must be.} 454 7. Empirically Specified Metrics 456 There are useful performance and reliability metrics that do not fit 457 so neatly into the A-frame, usually because the A-frame lacks the 458 detail or power for dealing with them. For example, "the best flow 459 capacity achievable along a path using an RFC-2001-compliant TCP" 460 would be good to be able to measure, but we have no analytical frame- 461 work of sufficient richness to allow us to cast that flow capacity as 462 an analytical metric. 464 These notions can still be well specified by instead describing a 465 reference methodology for measuring them. 467 Such a metric will be called an 'empirically specified metric', or 468 more simply, an empirical metric. 470 Such empirical metrics should have three properties: 471 + we should have a clear definition for each in terms of Internet 472 components, 473 + we should have at least one effective means to measure them, and 474 + to the extent possible, we should have an (necessarily incomplete) 475 understanding of the metric in terms of the A-frame so that we can 476 use our measurements to reason about the performance and reliabil- 477 ity of A-frame components and of aggregations of A-frame com- 478 ponents. 480 ID Framework for IP Performance Metrics November 1997 482 8. Two Forms of Composition 484 8.1. Spatial Composition of Metrics 486 In some cases, it may be realistic and useful to define metrics in 487 such a fashion that they exhibit spatial composition. 489 By spatial composition, we mean a characteristic of some path 490 metrics, in which the metric as applied to a (complete) path can also 491 be defined for various subpaths, and in which the appropriate A-frame 492 concepts for the metric suggest useful relationships between the 493 metric applied to these various subpaths (including the complete 494 path, the various cloud subpaths of a given path digest, and even 495 single routers along the path). The effectiveness of spatial compo- 496 sition depends: 497 + on the usefulness in analysis of these relationships as applied to 498 the relevant A-frame components, and 499 + on the practical use of the corresponding relationships as applied 500 to metrics and to measurement methodologies. 502 {Comment: for example, consider some metric for delay of a 100-byte 503 packet across a path P, and consider further a path digest of P. The definition of such a metric might include 505 a conjecture that the delay across P is very nearly the sum of the 506 corresponding metric across the exhanges (ei) and clouds (Ci) of the 507 given path digest. The definition would further include a note on 508 how a corresponding relation applies to relevant A-frame components, 509 both for the path P and for the exchanges and clouds of the path dig- 510 est.} 512 When the definition of a metric includes a conjecture that the metric 513 across the path is related to the metric across the subpaths of the 514 path, that conjecture constitutes a claim that the metric exhibits 515 spatial composition. The definition should then include: 516 + the specific conjecture applied to the metric, 517 + a justification of the practical utility of the composition in 518 terms of making accurate measurements of the metric on the path, 519 + a justification of the usefulness of the composition in terms of 520 making analysis of the path using A-frame concepts more effective, 521 and 522 + an analysis of how the conjecture could be incorrect. 524 ID Framework for IP Performance Metrics November 1997 526 8.2. Temporal Composition of Formal Models and Empirical Metrics 528 In some cases, it may be realistic and useful to define metrics in 529 such a fashion that they exhibit temporal composition. 531 By temporal composition, we mean a characteristic of some path 532 metric, in which the metric as applied to a path at a given time T is 533 also defined for various times t0 < t1 < ... < tn < T, and in which 534 the appropriate A-frame concepts for the metric suggests useful rela- 535 tionships between the metric applied at times t0, ..., tn and the 536 metric applied at time T. The effectiveness of temporal composition 537 depends: 538 + on the usefulness in analysis of these relationships as applied to 539 the relevant A-frame components, and 540 + on the practical use of the corresponding relationships as applied 541 to metrics and to measurement methodologies. 543 {Comment: for example, consider a metric for the expected flow capa- 544 city across a path P during the five-minute period surrounding the 545 time T, and suppose further that we have the corresponding values for 546 each of the four previous five-minute periods t0, t1, t2, and t3. 547 The definition of such a metric might include a conjecture that the 548 flow capacity at time T can be estimated from a certain kind of 549 extrapolation from the values of t0, ..., t3. The definition would 550 further include a note on how a corresponding relation applies to 551 relevant A-frame components. 553 Note: any (spatial or temporal) compositions involving flow capacity 554 are likely to be subtle, and temporal compositions are generally more 555 subtle than spatial compositions, so the reader should understand 556 that the foregoing example is intentionally naive.} 558 When the definition of a metric includes a conjecture that the metric 559 across the path at a given time T is related to the metric across the 560 path for a set of other times, that conjecture constitutes a claim 561 that the metric exhibits temporal composition. The definition should 562 then include: 563 + the specific conjecture applied to the metric, 564 + a justification of the practical utility of the composition in 565 terms of making accurate measurements of the metric on the path, 566 and 567 + a justification of the usefulness of the composition in terms of 568 making analysis of the path using A-frame concepts more effective. 570 ID Framework for IP Performance Metrics November 1997 572 9. Issues related to Time 574 9.1. Clock Issues 576 Measurements of time lie at the heart of many Internet metrics. 577 Because of this, it will often be crucial when designing a methodol- 578 ogy for measuring a metric to understand the different types of 579 errors and uncertainties introduced by imperfect clocks. In this 580 section we define terminology for discussing the characteristics of 581 clocks and touch upon related measurement issues which need to be 582 addressed by any sound methodology. 584 The Network Time Protocol (NTP; RFC 1305) defines a nomenclature for 585 discussing clock characteristics, which we will also use when 586 appropriate [Mi92]. The main goal of NTP is to provide accurate 587 timekeeping over fairly long time scales, such as minutes to days, 588 while for measurement purposes often what is more important is 589 short-term accuracy, between the beginning of the measurement and the 590 end, or over the course of gathering a body of measurements (a sam- 591 ple). This difference in goals sometimes leads to different defini- 592 tions of terminology as well, as discussed below. 594 To begin, we define a clock's "offset" at a particular moment as the 595 difference between the time reported by the clock and the "true" time 596 as defined by UTC. If the clock reports a time Tc and the true time 597 is Tt, then the clock's offset is Tc - Tt. 599 We will refer to a clock as "accurate" at a particular moment if the 600 clock's offset is zero, and more generally a clock's "accuracy" is 601 how close the absolute value of the offset is to zero. For NTP, 602 accuracy also includes a notion of the frequency of the clock; for 603 our purposes, we instead incorporate this notion into that of "skew", 604 because we define accuracy in terms of a single moment in time rather 605 than over an interval of time. 607 A clock's "skew" at a particular moment is the frequency difference 608 (first derivative of its offset with respect to true time) between 609 the clock and true time. 611 As noted in RFC 1305, real clocks exhibit some variation in skew. 612 That is, the second derivative of the clock's offset with respect to 613 true time is generally non-zero. In keeping with RFC 1305, we define 614 this quantity as the clock's "drift". 616 A clock's "resolution" is the smallest unit by which the clock's time 617 is updated. It gives a lower bound on the clock's uncertainty. 618 (Note that clocks can have very fine resolutions and yet be wildly 620 ID Framework for IP Performance Metrics November 1997 622 inaccurate.) Resolution is defined in terms of seconds. However, 623 resolution is relative to the clock's reported time and not to true 624 time, so for example a resolution of 10 ms only means that the clock 625 updates its notion of time in 0.01 second increments, not that this 626 is the true amount of time between updates. 628 {Comment: Systems differ on how an application interface to the clock 629 reports the time on subsequent calls during which the clock has not 630 advanced. Some systems simply return the same unchanged time as 631 given for previous calls. Others may add a small increment to the 632 reported time to maintain monotonic increasing timestamps. For sys- 633 tems that do the latter, we do *not* consider these small increments 634 when defining the clock's resolution. They are instead an impediment 635 to assessing the clock's resolution, since a natural method for doing 636 so is to repeatedly query the clock to determine the smallest non- 637 zero difference in reported times.} 639 It is expected that a clock's resolution changes only rarely (for 640 example, due to a hardware upgrade). 642 There are a number of interesting metrics for which some natural 643 measurement methodologies involve comparing times reported by two 644 different clocks. An example is one-way packet delay (currently an 645 Internet Draft [AK96]). Here, the time required for a packet to 646 travel through the network is measured by comparing the time reported 647 by a clock at one end of the packet's path, corresponding to when the 648 packet first entered the network, with the time reported by a clock 649 at the other end of the path, corresponding to when the packet fin- 650 ished traversing the network. 652 We are thus also interested in terminology for describing how two 653 clocks C1 and C2 compare. To do so, we introduce terms related to 654 those above in which the notion of "true time" is replaced by the 655 time as reported by clock C1. For example, clock C2's offset rela- 656 tive to C1 at a particular moment is Tc2 - Tc1, the instantaneous 657 difference in time reported by C2 and C1. To disambiguate between 658 the use of the terms to compare two clocks versus the use of the 659 terms to compare to true time, we will in the former case use the 660 phrase "relative". So the offset defined earlier in this paragraph 661 is the "relative offset" between C2 and C1. 663 When comparing clocks, the analog of "resolution" is not "relative 664 resolution", but instead "joint resolution", which is the sum of the 665 resolutions of C1 and C2. The joint resolution then indicates a con- 666 servative lower bound on the accuracy of any time intervals computed 667 by subtracting timestamps generated by one clock from those generated 668 by the other. 670 ID Framework for IP Performance Metrics November 1997 672 If two clocks are "accurate" with respect to one another (their rela- 673 tive offset is zero), we will refer to the pair of clocks as "syn- 674 chronized". Note that clocks can be highly synchronized yet arbi- 675 trarily inaccurate in terms of how well they tell true time. This 676 point is important because for many Internet measurements, synchroni- 677 zation between two clocks is more important than the accuracy of the 678 clocks. The is somewhat true of skew, too: as long as the absolute 679 skew is not too great, then minimal relative skew is more important, 680 as it can induce systematic trends in packet transit times measured 681 by comparing timestamps produced by the two clocks. 683 These distinctions arise because for Internet measurement what is 684 often most important are differences in time as computed by comparing 685 the output of two clocks. The process of computing the difference 686 removes any error due to clock inaccuracies with respect to true 687 time; but it is crucial that the differences themselves accurately 688 reflect differences in true time. 690 Measurement methodologies will often begin with the step of assuring 691 that two clocks are synchronized and have minimal skew and drift. 692 {Comment: An effective way to assure these conditions (and also clock 693 accuracy) is by using clocks that derive their notion of time from an 694 external source, rather than only the host computer's clock. (These 695 latter are often subject to large errors.) It is further preferable 696 that the clocks directly derive their time, for example by having 697 immediate access to a GPS (Global Positioning System) unit.} 699 Two important concerns arise if the clocks indirectly derive their 700 time using a network time synchronization protocol such as NTP: 701 + First, NTP's accuracy depends in part on the properties (particu- 702 larly delay) of the Internet paths used by the NTP peers, and 703 these might be exactly the properties that we wish to measure, so 704 it would be unsound to use NTP to calibrate such measurements. 705 + Second, NTP focuses on clock accuracy, which can come at the 706 expense of short-term clock skew and drift. For example, when a 707 host's clock is indirectly synchronized to a time source, if the 708 synchronization intervals occur infrequently, then the host will 709 sometimes be faced with the problem of how to adjust its current, 710 incorrect time, Ti, with a considerably different, more accurate 711 time it has just learned, Ta. Two general ways in which this is 712 done are to either immediately set the current time to Ta, or to 713 adjust the local clock's update frequency (hence, its skew) so 714 that at some point in the future the local time Ti' will agree 715 with the more accurate time Ta'. The first mechanism introduces 716 discontinuities and can also violate common assumptions that 717 timestamps are monotone increasing. If the host's clock is set 718 backward in time, sometimes this can be easily detected. If the 719 clock is set forward in time, this can be harder to detect. The 721 ID Framework for IP Performance Metrics November 1997 723 skew induced by the second mechanism can lead to considerable 724 inaccuracies when computing differences in time, as discussed 725 above. 727 To illustrate why skew is a crucial concern, consider samples of 728 one-way delays between two Internet hosts made at one minute inter- 729 vals. The true transmission delay between the hosts might plausibly 730 be on the order of 50 ms for a transcontinental path. If the skew 731 between the two clocks is 0.01%, that is, 1 part in 10,000, then 732 after 10 minutes of observation the error introduced into the meas- 733 urement is 60 ms. Unless corrected, this error is enough to com- 734 pletely wipe out any accuracy in the transmission delay measurement. 735 Finally, we note that assessing skew errors between unsynchronized 736 network clocks is an open research area. (See [Pa97] for a discus- 737 sion of detecting and compensating for these sorts of errors.) This 738 shortcoming makes use of a solid, independent clock source such as 739 GPS especially desirable. 741 9.2. The Notion of "Wire Time" 743 Internet measurement is often complicated by the use of Internet 744 hosts themselves to perform the measurement. These hosts can intro- 745 duce delays, bottlenecks, and the like that are due to hardware or 746 operating system effects and have nothing to do with the network 747 behavior we would like to measure. This problem is particularly 748 acute when timestamping of network events occurs at the application 749 level. 751 In order to provide a general way of talking about these effects, we 752 introduce two notions of "wire time". These notions are only defined 753 in terms of an Internet host H observing an Internet link L at a par- 754 ticular location: 755 + For a given packet P, the 'wire arrival time' of P at H on L is 756 the first time T at which any bit of P has appeared at H's obser- 757 vational position on L. 758 + For a given packet P, the 'wire exit time' of P at H on L is the 759 first time T at which all the bits of P have appeared at H's 760 observational position on L. 761 Note that intrinsic to the definition is the notion of where on the 762 link we are observing. This distinction is important because for 763 large-latency links, we may obtain very different times depending on 764 exactly where we are observing the link. We could allow the observa- 765 tional position to be an arbitrary location along the link; however, 766 we define it to be in terms of an Internet host because we anticipate 767 in practice that, for IPPM metrics, all such timing will be con- 768 strained to be performed by Internet hosts, rather than specialized 769 hardware devices that might be able to monitor a link at locations 771 ID Framework for IP Performance Metrics November 1997 773 where a host cannot. This definition also takes care of the problem 774 of links that are comprised of multiple physical channels. Because 775 these multiple channels are not visible at the IP layer, they cannot 776 be individually observed in terms of the above definitions. 778 It is possible, though one hopes uncommon, that a packet P might make 779 multiple trips over a particular link L, due to a forwarding loop. 780 These trips might even overlap, depending on the link technology. 781 Whenever this occurs, we define a separate wire time associated with 782 each instance of P seen at H's position on the link. This definition 783 is worth making because it serves as a reminder that notions like 784 *the* unique time a packet passes a point in the Internet are 785 inherently slippery. 787 The term wire time has historically been used to loosely denote the 788 time at which a packet appeared on a link, without exactly specifying 789 whether this refers to the first bit, the last bit, or some other 790 consideration. This informal definition is generally already very 791 useful, as it is usually used to make a distinction between when the 792 packet's propagation delays begin and cease to be due to the network 793 rather than the endpoint hosts. 795 When appropriate, metrics should be defined in terms of wire times 796 rather than host endpoint times, so that the metric's definition 797 highlights the issue of separating delays due to the host from those 798 due to the network. 800 We note that one potential difficulty when dealing with wire times 801 concerns IP fragments. It may be the case that, due to fragmenta- 802 tion, only a portion of a particular packet passes by H's location. 803 Such fragments are themselves legitimate packets and have well- 804 defined wire times associated with them; but the larger IP packet 805 corresponding to their aggregate may not. 807 We also note that these notions have not, to our knowledge, been pre- 808 viously defined in exact terms for Internet traffic. Consequently, 809 we may find with experience that these definitions require some 810 adjustment in the future. 812 {Comment: It can sometimes be difficult to measure wire times. One 813 technique is to use a packet filter to monitor traffic on a link. 814 The architecture of these filters often attempts to associate with 815 each packet a timestamp as close to the wire time as possible. We 816 note however that one common source of error is to run the packet 817 filter on one of the endpoint hosts. In this case, it has been 818 observed that some packet filters receive for some packets timestamps 819 corresponding to when the packet was *scheduled* to be injected into 820 the network, rather than when it actually was *sent* out onto the 822 ID Framework for IP Performance Metrics November 1997 824 network (wire time). There can be a substantial difference between 825 these two times. A technique for dealing with this problem is to run 826 the packet filter on a separate host that passively monitors the 827 given link. This can be problematic however for some link technolo- 828 gies. See [Pa97] for a discussion of the sorts of errors packet 829 filters can exhibit. Finally, we note that packet filters will often 830 only capture the first fragment of a fragmented IP packet, due to the 831 use of filtering on fields in the IP and transport protocol headers. 832 As we generally desire our measurement methodologies to avoid the 833 complexity of creating fragmented traffic, one strategy for dealing 834 with their presence as detected by a packet filter is to flag that 835 the measured traffic has an unusual form and abandon further analysis 836 of the packet timing.} 838 10. Singletons, Samples, and Statistics 840 With experience we have found it useful to introduce a separation 841 between three distinct -- yet related -- notions: 842 + By a 'singleton' metric, we refer to metrics that are, in a sense, 843 atomic. For example, a single instance of "bulk throughput capa- 844 city" from one host to another might be defined as a singleton 845 metric, even though the instance involves measuring the timing of 846 a number of Internet packets. 847 + By a 'sample' metric, we refer to metrics derived from a given 848 singleton metric by taking a number of distinct instances 849 together. For example, we might define a sample metric of one-way 850 delays from one host to another as an hour's worth of measure- 851 ments, each made at Poisson intervals with a mean spacing of one 852 second. 853 + By a 'statistical' metric, we refer to metrics derived from a 854 given sample metric by computing some statistic of the values 855 defined by the singleton metric on the sample. For example, the 856 mean of all the one-way delay values on the sample given above 857 might be defined as a statistical metric. 858 By applying these notions of singleton, sample, and statistic in a 859 consistent way, we will be able to reuse lessons learned about how to 860 define samples and statistics on various metrics. The orthogonality 861 among these three notions will thus make all our work more effective 862 and more intelligible by the community. 864 In the remainder of this section, we will cover some topics in sam- 865 pling and statistics that we believe will be important to a variety 866 of metric definitions and measurement efforts. 868 ID Framework for IP Performance Metrics November 1997 870 10.1. Methods of Collecting Samples 872 The main reason for collecting samples is to see what sort of varia- 873 tions and consistencies are present in the metric being measured. 874 These variations might be with respect to different points in the 875 Internet, or different measurement times. When assessing variations 876 based on a sample, one generally makes an assumption that the sample 877 is "unbiased", meaning that the process of collecting the measure- 878 ments in the sample did not skew the sample so that it no longer 879 accurately reflects the metric's variations and consistencies. 881 One common way of collecting samples is to make measurements 882 separated by fixed amounts of time: periodic sampling. Periodic sam- 883 pling is particularly attractive because of its simplicity, but it 884 suffers from two potential problems: 885 + If the metric being measured itself exhibits periodic behavior, 886 then there is a possibility that the sampling will observe only 887 part of the periodic behavior if the periods happen to agree 888 (either directly, or if one is a multiple of the other). Related 889 to this problem is the notion that periodic sampling can be easily 890 anticipated. Predictable sampling is susceptible to manipulation 891 if there are mechanisms by which a network component's behavior 892 can be temporarily changed such that the sampling only sees the 893 modified behavior. 894 + The act of measurement can perturb what is being measured (for 895 example, injecting measurement traffic into a network alters the 896 congestion level of the network), and repeated periodic perturba- 897 tions can drive a network into a state of synchronization (cf. 898 [FJ94]), greatly magnifying what might individually be minor 899 effects. 901 A more sound approach is based on "random additive sampling": samples 902 are separated by independent, randomly generated intervals that have 903 a common statistical distribution G(t) [BM92]. The quality of this 904 sampling depends on the distribution G(t). For example, if G(t) gen- 905 erates a constant value g with probability one, then the sampling 906 reduces to periodic sampling with a period of g. 908 Random additive sampling gains significant advantages. In general, 909 it avoids synchronization effects and yields an unbiased estimate of 910 the property being sampled. The only significant drawbacks with it 911 are: 912 + it complicates frequency-domain analysis, because the samples do 913 not occur at fixed intervals such as assumed by Fourier-transform 914 techniques; and 916 ID Framework for IP Performance Metrics November 1997 918 + unless G(t) is the exponential distribution (see below), sampling 919 still remains somewhat predictable, as discussed for periodic sam- 920 pling above. 922 10.1.1. Poisson Sampling 924 It can be proved that if G(t) is an exponential distribution with 925 rate lambda, that is 926 G(t) = 1 - exp(-lambda * t) 927 then the arrival of new samples *cannot* be predicted (and, again, 928 the sampling is unbiased). Furthermore, the sampling is asymptoti- 929 cally unbiased even if the act of sampling affects the network's 930 state. Such sampling is referred to as "Poisson sampling". It is 931 not prone to inducing synchronization, it can be used to accurately 932 collect measurements of periodic behavior, and it is not prone to 933 manipulation by anticipating when new samples will occur. 935 Because of these valuable properties, we in general prefer that sam- 936 ples of Internet measurements are gathered using Poisson sampling. 937 {Comment: We note, however, that there may be circumstances that 938 favor use of a different G(t). For example, the exponential distri- 939 bution is unbounded, so its use will on occasion generate lengthy 940 spaces between sampling times. We might instead desire to bound the 941 longest such interval to a maximum value dT, to speed the convergence 942 of the estimation derived from the sampling. This could be done by 943 using 944 G(t) = Unif(0, dT) 945 that is, the uniform distribution between 0 and dT. This sampling, 946 of course, becomes highly predictable if an interval of nearly length 947 dT has elapsed without a sample occurring.} 949 In its purest form, Poisson sampling is done by generating indepen- 950 dent, exponentially distributed intervals and gathering a single 951 measurement after each interval has elapsed. It can be shown that if 952 starting at time T one performs Poisson sampling over an interval dT, 953 during which a total of N measurements happen to be made, then those 954 measurements will be uniformly distributed over the interval [T, 955 T+dT]. So another way of conducting Poisson sampling is to pick dT 956 and N and generate N random sampling times uniformly over the inter- 957 val [T, T+dT]. The two approaches are equivalent, except if N and dT 958 are externally known. In that case, the property of not being able 959 to predict measurement times is weakened (the other properties still 960 hold). The N/dT approach has an advantage that dealing with fixed 961 values of N and dT can be simpler than dealing with a fixed lambda 962 but variable numbers of measurements over variably-sized intervals. 964 ID Framework for IP Performance Metrics November 1997 966 10.1.2. Geometric Sampling 968 Closely related to Poisson sampling is "geometric sampling", in which 969 external events are measured with a fixed probability p. For exam- 970 ple, one might capture all the packets over a link but only record 971 the packet to a trace file if a randomly generated number uniformly 972 distributed between 0 and 1 is less than a given p. Geometric sam- 973 pling has the same properties of being unbiased and not predictable 974 in advance as Poisson sampling, so if it fits a particular Internet 975 measurement task, it too is sound. See [CPB93] for more discussion. 977 10.1.3. Generating Poisson Sampling Intervals 979 To generate Poisson sampling intervals, one first determines the rate 980 lambda at which the singleton measurements will on average be made 981 (e.g., for an average sampling interval of 30 seconds, we have lambda 982 = 1/30, if the units of time are seconds). One then generates a 983 series of exponentially-distributed (pseudo-)random numbers E1, E2, 984 ..., En. The first measurement is made at time E1, the next at time 985 E1+E2, and so on. 987 One technique for generating exponentially-distributed (pseudo- 988 )random numbers is based on the ability to generate U1, U2, ..., Un, 989 (pseudo-)random numbers that are uniformly distributed between 0 and 990 1. Many computers provide libraries that can do this. Given such 991 Ui, to generate Ei one uses: 992 Ei = -log(Ui) / lambda 993 where log(Ui) is the natural logarithm of Ui. {Comment: This tech- 994 nique is an instance of the more general "inverse transform" method 995 for generating random numbers with a given distribution.} 997 Implementation details: 999 There are at least three different methods for approximating Poisson 1000 sampling, which we describe here as Methods 1 through 3. Method 1 is 1001 the easiest to implement and has the most error, and method 3 is the 1002 most difficult to implement and has the least error (potentially 1003 none). 1005 Method 1 is to proceed as follows: 1006 1. Generate E1 and wait that long. 1007 2. Perform a measurement. 1008 3. Generate E2 and wait that long. 1009 4. Perform a measurement. 1010 5. Generate E3 and wait that long. 1011 6. Perform a measurement ... 1013 ID Framework for IP Performance Metrics November 1997 1015 The problem with this approach is that the "Perform a measurement" 1016 steps themselves take time, so the sampling is not done at times E1, 1017 E1+E2, etc., but rather at E1, E1+M1+E2, etc., where Mi is the amount 1018 of time required for the i'th measurement. If Mi is very small com- 1019 pared to 1/lambda then the potential error introduced by this tech- 1020 nique is likewise small. As Mi becomes a non-negligible fraction of 1021 1/lambda, the potential error increases. 1023 Method 2 attempts to correct this error by taking into account the 1024 amount of time required by the measurements (i.e., the Mi's) and 1025 adjusting the waiting intervals accordingly: 1026 1. Generate E1 and wait that long. 1027 2. Perform a measurement and measure M1, the time it took to do so. 1028 3. Generate E2 and wait for a time E2-M1. 1029 4. Perform a measurement and measure M2 .. 1031 This approach works fine as long as E{i+1} >= Mi. But if E{i+1} < Mi 1032 then it is impossible to wait the proper amount of time. (Note that 1033 this case corresponds to needing to perform two measurements simul- 1034 taneously.) 1036 Method 3 is generating a schedule of measurement times E1, E1+E2, 1037 etc., and then sticking to it: 1038 1. Generate E1, E2, ..., En. 1039 2. Compute measurement times T1, T2, ..., Tn, as Ti = E1 + ... + Ei. 1040 3. Arrange that at times T1, T2, ..., Tn, a measurement is made. 1042 By allowing simultaneous measurements, Method 3 avoids the shortcom- 1043 ings of Methods 1 and 2. If, however, simultaneous measurements 1044 interfere with one another, then Method 3 does not gain any benefit 1045 and may actually prove worse than Methods 1 or 2. 1047 For Internet phenomena, it is not known to what degree the inaccura- 1048 cies of these methods are significant. If the Mi's are much less 1049 than 1/lambda, then any of the three should suffice. If the Mi's are 1050 less than 1/lambda but perhaps not greatly less, then Method 2 is 1051 preferred to Method 1. If simultaneous measurements do not interfere 1052 with one another, then Method 3 is preferred, though it can be con- 1053 siderably harder to implement. 1055 10.2. Self-Consistency 1057 A fundamental requirement for a sound measurement methodology is that 1058 measurement be made using as few unconfirmed assumptions as possible. 1059 Experience has painfully shown how easy it is to make an (often 1060 implicit) assumption that turns out to be incorrect. An example is 1061 incorporating into a measurement the reading of a clock synchronized 1063 ID Framework for IP Performance Metrics November 1997 1065 to a highly accurate source. It is easy to assume that the clock is 1066 therefore accurate; but due to software bugs, a loss of power in the 1067 source, or a loss of communication between the source and the clock, 1068 the clock could actually be quite inaccurate. 1070 This is not to argue that one must not make *any* assumptions when 1071 measuring, but rather that, to the extent which is practical, assump- 1072 tions should be tested. One powerful way for doing so involves 1073 checking for self-consistency. Such checking applies both to the 1074 observed value(s) of the measurement *and the values used by the 1075 measurement process itself*. A simple example of the former is that 1076 when computing a round trip time, one should check to see if it is 1077 negative. Since negative time intervals are non-physical, if it ever 1078 is negative that finding immediately flags an error. *These sorts of 1079 errors should then be investigated!* It is crucial to determine where 1080 the error lies, because only by doing so diligently can we build up 1081 faith in a methodology's fundamental soundness. For example, it 1082 could be that the round trip time is negative because during the 1083 measurement the clock was set backward in the process of synchroniz- 1084 ing it with another source. But it could also be that the measure- 1085 ment program accesses uninitialized memory in one of its computations 1086 and, only very rarely, that leads to a bogus computation. This 1087 second error is more serious, if the same program is used by others 1088 to perform the same measurement, since then they too will suffer from 1089 incorrect results. Furthermore, once uncovered it can be completely 1090 fixed. 1092 A more subtle example of testing for self-consistency comes from 1093 gathering samples of one-way Internet delays. If one has a large 1094 sample of such delays, it may well be highly telling to, for example, 1095 fit a line to the pairs of (time of measurement, measured delay), to 1096 see if the resulting line has a clearly non-zero slope. If so, a 1097 possible interpretation is that one of the clocks used in the meas- 1098 urements is skewed relative to the other. Another interpretation is 1099 that the slope is actually due to genuine network effects. Determin- 1100 ing which is indeed the case will often be highly illuminating. (See 1101 [Pa97] for a discussion of distinguishing between relative clock skew 1102 and genuine network effects.) Furthermore, if making this check is 1103 part of the methodology, then a finding that the long-term slope is 1104 very near zero is positive evidence that the measurements are prob- 1105 ably not biased by a difference in skew. 1107 A final example illustrates checking the measurement process itself 1108 for self-consistency. Above we outline Poisson sampling techniques, 1109 based on generating exponentially-distributed intervals. A sound 1110 measurement methodology would include testing the generated intervals 1111 to see whether they are indeed exponentially distributed (and also to 1112 see if they suffer from correlation). In the appendix we discuss and 1114 ID Framework for IP Performance Metrics November 1997 1116 give C code for one such technique, a general-purpose, well-regarded 1117 goodness-of-fit test called the Anderson-Darling test. 1119 Finally, we note that what is truly relevant for Poisson sampling of 1120 Internet metrics is often not when the measurements began but the 1121 wire times corresponding to the measurement process. These could 1122 well be different, due to complications on the hosts used to perform 1123 the measurement. Thus, even those with complete faith in their 1124 pseudo-random number generators and subsequent algorithms are 1125 encouraged to consider how they might test the assumptions of each 1126 measurement procedure as much as possible. 1128 10.3. Defining Statistical Distributions 1130 One way of describing a collection of measurements (a sample) is as a 1131 statistical distribution -- informally, as percentiles. There are 1132 several slightly different ways of doing so. In this section we 1133 define a standard definition to give uniformity to these descrip- 1134 tions. 1136 The "empirical distribution function" (EDF) of a set of scalar meas- 1137 urements is a function F(x) which for any x gives the fractional pro- 1138 portion of the total measurements that were <= x. If x is less than 1139 the minimum value observed, then F(x) is 0. If it is greater or 1140 equal to the maximum value observed, then F(x) is 1. 1142 For example, given the 6 measurements: 1143 -2, 7, 7, 4, 18, -5 1144 Then F(-8) = 0, F(-5) = 1/6, F(-5.0001) = 0, F(-4.999) = 1/6, F(7) = 1145 5/6, F(18) = 1, F(239) = 1. 1147 Note that we can recover the different measured values and how many 1148 times each occurred from F(x) -- no information regarding the range 1149 in values is lost. Summarizing measurements using histograms, on the 1150 other hand, in general loses information about the different values 1151 observed, so the EDF is preferred. 1153 Using either the EDF or a histogram, however, we do lose information 1154 regarding the order in which the values were observed. Whether this 1155 loss is potentially significant will depend on the metric being meas- 1156 ured. 1158 We will use the term "percentile" to refer to the smallest value of x 1159 for which F(x) >= a given percentage. So the 50th percentile of the 1160 example above is 4, since F(4) = 3/6 = 50%; the 25th percentile is 1161 -2, since F(-5) = 1/6 < 25%, and F(-2) = 2/6 >= 25%; the 100th per- 1162 centile is 18; and the 0th percentile is -infinity, as is the 15th 1164 ID Framework for IP Performance Metrics November 1997 1166 percentile. 1168 Care must be taken when using percentiles to summarize a sample, 1169 because they can lend an unwarranted appearance of more precision 1170 than is really available. Any such summary MUST include the sample 1171 size N, because any percentile difference finer than 1/N is below the 1172 resolution of the sample. 1174 See [DS86] for more details regarding EDF's. 1176 We close with a note on the common (and important!) notion of median. 1177 In statistics, the median of a distribution is defined to be the 1178 point X for which the probability of observing a value <= X is equal 1179 to the probability of observing a value > X. When estimating the 1180 median of a set of observations, the estimate depends on whether the 1181 number of observations, N, is odd or even: 1182 + If N is odd, then the 50th percentile as defined above is used as 1183 the estimated median. 1184 + If N is even, then the estimated median is the average of the cen- 1185 tral two observations; that is, if the observations are sorted in 1186 ascending order and numbered from 1 to N, where N = 2*K, then the 1187 estimated median is the average of the (K)'th and (K+1)'th obser- 1188 vations. 1189 Usually the term "estimated" is dropped from the phrase "estimated 1190 median" and this value is simply referred to as the "median". 1192 10.4. Testing For Goodness-of-Fit 1194 For some forms of measurement calibration we need to test whether a 1195 set of numbers is consistent with those numbers having been drawn 1196 from a particular distribution. An example is that to apply a self- 1197 consistency check to measurements made using a Poisson process, one 1198 test is to see whether the spacing between the sampling times does 1199 indeed reflect an exponential distribution; or if the dT/N approach 1200 discussed above was used, whether the times are uniformly distributed 1201 across [T, dT]. 1203 {Comment: There are at least three possible sets of values we could 1204 test: the scheduled packet transmission times, as determined by use 1205 of a pseudo-random number generator; user-level timestamps made just 1206 before or after the system call for transmitting the packet; and wire 1207 times for the packets as recorded using a packet filter. All three 1208 of these are potentially informative: failures for the scheduled 1209 times to match an exponential distribution indicate inaccuracies in 1210 the random number generation; failures for the user-level times indi- 1211 cate inaccuracies in the timers used to schedule transmission; and 1212 failures for the wire times indicate inaccuracies in actually 1214 ID Framework for IP Performance Metrics November 1997 1216 transmitting the packets, perhaps due to contention for a shared 1217 resource.} 1219 There are a large number of statistical goodness-of-fit techniques 1220 for performing such tests. See [DS86] for a thorough discussion. 1221 That reference recommends the Anderson-Darling EDF test as being a 1222 good all-purpose test, as well as one that is especially good at 1223 detecting deviations from a given distribution in the lower and upper 1224 tails of the EDF. 1226 It is important to understand that the nature of goodness-of-fit 1227 tests is that one first selects a "significance level", which is the 1228 probability that the test will erroneously declare that the EDF of a 1229 given set of measurements fails to match a particular distribution 1230 when in fact the measurements do indeed reflect that distribution. 1231 Unless otherwise stated, IPPM goodness-of-fit tests are done using 5% 1232 significance. This means that if the test is applied to 100 samples 1233 and 5 of those samples are deemed to have failed the test, then the 1234 samples are all consistent with the distribution being tested. If 1235 significantly more of the samples fail the test, then the assumption 1236 that the samples are consistent with the distribution being tested 1237 must be rejected. If significantly fewer of the samples fail the 1238 test, then the samples have potentially been doctored too well to fit 1239 the distribution. Similarly, some goodness-of-fit tests (including 1240 Anderson-Darling) can detect whether it is likely that a given sample 1241 was doctored. We also use a significance of 5% for this case; that 1242 is, the test will report that a given honest sample is "too good to 1243 be true" 5% of the time, so if the test reports this finding signifi- 1244 cantly more often than one time out of twenty, it is an indication 1245 that something unusual is occurring. 1247 The appendix gives sample C code for implementing the Anderson- 1248 Darling test, as well as further discussing its use. 1250 See [Pa94] for a discussion of goodness-of-fit and closeness-of-fit 1251 tests in the context of network measurement. 1253 11. Avoiding Stochastic Metrics 1255 When defining metrics applying to a path, subpath, cloud, or other 1256 network element, we in general do not define them in stochastic terms 1257 (probabilities). We instead prefer a deterministic definition. So, 1258 for example, rather than defining a metric about a "packet loss pro- 1259 bability between A and B", we would define a metric about a "packet 1260 loss rate between A and B". (A measurement given by the first defin- 1261 ition might be "0.73", and by the second "73 packets out of 100".) 1263 ID Framework for IP Performance Metrics November 1997 1265 The reason for this distinction is as follows. When definitions are 1266 made in terms of probabilities, there are often hidden assumptions in 1267 the definition about a stochastic model of the behavior being meas- 1268 ured. The fundamental goal with avoiding probabilities in our metric 1269 definitions is to avoid biasing our definitions by these hidden 1270 assumptions. 1272 For example, an easy hidden assumption to make is that packet loss in 1273 a network component due to queueing overflows can be described as 1274 something that happens to any given packet with a particular proba- 1275 bility. Usually, however, queueing drops are actually *determinis- 1276 tic*, and assuming that they should be described probabilistically 1277 can obscure crucial correlations between queueing drops among a set 1278 of packets. So it's better to explicitly note stochastic assump- 1279 tions, rather than have them sneak into our definitions implicitly. 1281 This does *not* mean that we abandon stochastic models for under- 1282 standing network performance! It only means that when defining IP 1283 metrics we avoid terms such as "probability" for terms like "propor- 1284 tion" or "rate". We will still use, for example, random sampling in 1285 order to estimate probabilities used by stochastic models related to 1286 the IP metrics. We also do not rule out the possibility of stochas- 1287 tic metrics when they are truly appropriate (for example, perhaps to 1288 model transmission errors caused by certain types of line noise). 1290 12. Packets of Type P 1292 A fundamental property of many Internet metrics is that the value of 1293 the metric depends on the type of IP packet(s) used to make the meas- 1294 urement. Consider an IP-connectivity metric: one obtains different 1295 results depending on whether one is interested in connectivity for 1296 packets destined for well-known TCP ports or unreserved UDP ports, or 1297 those with invalid IP checksums, or those with TTL's of 16, for exam- 1298 ple. In some circumstances these distinctions will be highly 1299 interesting (for example, in the presence of firewalls, or RSVP 1300 reservations). 1302 Because of this distinction, we introduce the generic notion of a 1303 "packet of type P", where in some contexts P will be explicitly 1304 defined (i.e., exactly what type of packet we mean), partially 1305 defined (e.g., "with a payload of B octets"), or left generic. Thus 1306 we may talk about generic IP-type-P-connectivity or more specific 1307 IP-port-HTTP-connectivity. Some metrics and methodologies may be 1308 fruitfully defined using generic type P definitions which are then 1309 made specific when performing actual measurements. 1311 Whenever a metric's value depends on the type of the packets involved 1313 ID Framework for IP Performance Metrics November 1997 1315 in the metric, the metric's name will include either a specific type 1316 or a phrase such as "type-P". Thus we will not define an "IP- 1317 connectivity" metric but instead an "IP-type-P-connectivity" metric 1318 and/or perhaps an "IP-port-HTTP-connectivity" metric. This naming 1319 convention serves as an important reminder that one must be conscious 1320 of the exact type of traffic being measured. 1322 A closely related note: it would be very useful to know if a given 1323 Internet component treats equally a class C of different types of 1324 packets. If so, then any one of those types of packets can be used 1325 for subsequent measurement of the component. This suggests we devise 1326 a metric or suite of metrics that attempt to determine C. 1328 13. Internet Addresses vs. Hosts 1330 When considering a metric for some path through the Internet, it is 1331 often natural to think about it as being for the path from Internet 1332 host H1 to host H2. A definition in these terms, though, can be 1333 ambiguous, because Internet hosts can be attached to more than one 1334 network. In this case, the result of the metric will depend on which 1335 of these networks is actually used. 1337 Because of this ambiguitiy, usually such definitions should instead 1338 be defined in terms of Internet IP addresses. For the common case of 1339 a unidirectional path through the Internet, we will use the term 1340 "Src" to denote the IP address of the beginning of the path, and 1341 "Dst" to denote the IP address of the end. 1343 14. Standard-Formed Packets 1345 Unless otherwise stated, all metric definitions that concern IP pack- 1346 ets include an implicit assumption that the packet is *standard 1347 formed*. A packet is standard formed if it meets all of the follow- 1348 ing criteria: 1349 + Its length as given in the IP header corresponds to the size of 1350 the IP header plus the size of the payload. 1351 + It includes a valid IP header: the version field is 4 (later, we 1352 will expand this to include 6); the header length is >= 5; the 1353 checksum is correct. 1354 + It is not an IP fragment. 1355 + The source and destination addresses correspond to the hosts in 1356 question. 1358 ID Framework for IP Performance Metrics November 1997 1360 + Either the packet possesses sufficient TTL to travel from the 1361 source to the destination if the TTL is decremented by one at each 1362 hop, or it possesses the maximum TTL of 255. 1363 + It does not contain IP options unless explicitly noted. 1364 + If a transport header is present, it too contains a valid checksum 1365 and other valid fields. 1366 We further require that if a packet is described as having a "length 1367 of B octets", then 0 <= B <= 65535; and if B is the payload length in 1368 octets, then B <= (65535-IP header size in octets). 1370 So, for example, one might imagine defining an IP connectivity metric 1371 as "IP-type-P-connectivity for standard-formed packets with the IP 1372 TOS field set to 0", or, more succinctly, "IP-type-P-connectivity 1373 with the IP TOS field set to 0", since standard-formed is already 1374 implied by convention. 1376 A particular type of standard-formed packet often useful to consider 1377 is the "minimal IP packet from A to B" - this is an IP packet with 1378 the following properties: 1379 + It is standard-formed. 1380 + Its data payload is 0 octets. 1381 + It contains no options. 1382 (Note that we do not define its protocol field, as different values 1383 may lead to different treatment by the network.) 1385 When defining IP metrics we keep in mind that no packet smaller or 1386 simpler than this can be transmitted over a correctly operating IP 1387 network. 1389 15. Acknowledgements 1391 The comments of Brian Carpenter, Bill Cerveny, Padma Krishnaswamy 1392 Jeff Sedayao and Howard Stanislevic are appreciated. 1394 16. Security Considerations 1396 This memo raises no security issues. 1398 ID Framework for IP Performance Metrics November 1997 1400 17. Appendix 1402 Below we give routines written in C for computing the Anderson- 1403 Darling test statistic (A2) for determining whether a set of values 1404 is consistent with a given statistical distribution. Externally, the 1405 two main routines of interest are: 1406 double exp_A2_known_mean(double x[], int n, double mean) 1407 double unif_A2_known_range(double x[], int n, 1408 double min_val, double max_val) 1409 Both take as their first argument, x, the array of n values to be 1410 tested. (Upon return, the elements of x are sorted.) The remaining 1411 parameters characterize the distribution to be used: either the mean 1412 (1/lambda), for an exponential distribution, or the lower and upper 1413 bounds, for a uniform distribution. The names of the routines stress 1414 that these values must be known in advance, and *not* estimated from 1415 the data (for example, by computing its sample mean). Estimating the 1416 parameters from the data *changes* the significance level of the test 1417 statistic. While [DS86] gives alternate significance tables for some 1418 instances in which the parameters are estimated from the data, for 1419 our purposes we expect that we should indeed know the parameters in 1420 advance, since what we will be testing are generally values such as 1421 packet sending times that we wish to verify follow a known distribu- 1422 tion. 1424 Both routines return a significance level, as described earlier. This 1425 is a value between 0 and 1. The correct use of the routines is to 1426 pick in advance the threshold for the significance level to test; 1427 generally, this will be 0.05, corresponding to 5%, as also described 1428 above. Subsequently, if the routines return a value strictly less 1429 than this threshold, then the data are deemed to be inconsistent with 1430 the presumed distribution, *subject to an error corresponding to the 1431 significance level*. That is, for a significance level of 5%, 5% of 1432 the time data that is indeed drawn from the presumed distribution 1433 will be erroneously deemed inconsistent. 1435 Thus, it is important to bear in mind that if these routines are used 1436 frequently, then one will indeed encounter occasional failures, even 1437 if the data is unblemished. 1439 Another important point concerning significance levels is that it is 1440 unsound to compare them in order to determine which of two sets of 1441 values is a "better" fit to a presumed distribution. Such testing 1442 should instead be done using "closeness-of-fit metrics" such as the 1443 lambda^2 metric described in [Pa94]. 1445 While the routines provided are for exponential and uniform distribu- 1446 tions with known parameters, it is generally straight-forward to 1447 write comparable routines for any distribution with known parameters. 1449 ID Framework for IP Performance Metrics November 1997 1451 The heart of the A2 tests lies in a statistic computed for testing 1452 whether a set of values is consistent with a uniform distribution 1453 between 0 and 1, which we term Unif(0, 1). If we wish to test 1454 whether a set of values, X, is consistent with a given distribution 1455 G(x), we first compute 1456 Y = G_inverse(X) 1457 If X is indeed distributed according to G(x), then Y will be distri- 1458 buted according to Unif(0, 1); so by testing Y for consistency with 1459 Unif(0, 1), we also test X for consistency with G(x). 1461 We note, however, that the process of computing Y above might yield 1462 values of Y outside the range (0..1). Such values should not occur 1463 if X is indeed distributed according to G(x), but easily can occur if 1464 it is not. In the latter case, we need to avoid computing the cen- 1465 tral A2 statistic, since floating-point exceptions may occur if any 1466 of the values lie outside (0..1). Accordingly, the routines check 1467 for this possiblity, and if encountered, return a raw A2 statistic of 1468 -1. The routine that converts the raw A2 statistic to a significance 1469 level likewise propagates this value, returning a significance level 1470 of -1. So, any use of these routines must be prepared for a possible 1471 negative significance level. 1473 The last important point regarding use of A2 statistic concerns n, 1474 the number of values being tested. If n < 5 then the test is not 1475 meaningful, and in this case a significance level of -1 is returned. 1477 On the other hand, for "real" data the test *gains* power as n 1478 becomes larger. It is well known in the statistics community that 1479 real data almost never exactly matches a theoretical distribution, 1480 even in cases such as rolling dice a great many times (see [Pa94] for 1481 a brief discussion and references). The A2 test is sensitive enough 1482 that, for sufficiently large sets of real data, the test will almost 1483 always fail, because it will manage to detect slight imperfections in 1484 the fit of the data to the distribution. 1486 For example, we have found that when testing 8,192 measured wire 1487 times for packets sent at Poisson intervals, the measurements almost 1488 always fail the A2 test. On the other hand, testing 128 measurements 1489 failed at 5% significance only about 5% of the time, as expected. 1490 Thus, in general, when the test fails, care must be taken to under- 1491 stand why it failed. 1493 The remainder of this appendix gives C code for the routines men- 1494 tioned above. 1496 /* Routines for computing the Anderson-Darling A2 test statistic. 1497 * 1498 * Implememented based on the description in "Goodness-of-Fit 1500 ID Framework for IP Performance Metrics November 1997 1502 * Techniques," R. D'Agostino and M. Stephens, editors, 1503 * Marcel Dekker, Inc., 1986. 1504 */ 1506 #include 1507 #include 1508 #include 1510 /* Returns the raw A^2 test statistic for n sorted samples 1511 * z[0] .. z[n-1], for z ~ Unif(0,1). 1512 */ 1513 extern double compute_A2(double z[], int n); 1515 /* Returns the significance level associated with a A^2 test 1516 * statistic value of A2, assuming no parameters of the tested 1517 * distribution were estimated from the data. 1518 */ 1519 extern double A2_significance(double A2); 1521 /* Returns the A^2 significance level for testing n observations 1522 * x[0] .. x[n-1] against an exponential distribution with the 1523 * given mean. 1524 * 1525 * SIDE EFFECT: the x[0..n-1] are sorted upon return. 1526 */ 1527 extern double exp_A2_known_mean(double x[], int n, double mean); 1529 /* Returns the A^2 significance level for testing n observations 1530 * x[0] .. x[n-1] against the uniform distribution [min_val, max_val]. 1531 * 1532 * SIDE EFFECT: the x[0..n-1] are sorted upon return. 1533 */ 1534 extern double unif_A2_known_range(double x[], int n, 1535 double min_val, double max_val); 1537 /* Returns a pseudo-random number distributed according to an 1538 * exponential distribution with the given mean. 1539 */ 1540 extern double random_exponential(double mean); 1542 /* Helper function used by qsort() to sort double-precision 1543 * floating-point values. 1544 */ 1545 static int 1546 compare_double(const void *v1, const void *v2) 1547 { 1548 double d1 = *(double *) v1; 1550 ID Framework for IP Performance Metrics November 1997 1552 double d2 = *(double *) v2; 1554 if (d1 < d2) 1555 return -1; 1556 else if (d1 > d2) 1557 return 1; 1558 else 1559 return 0; 1560 } 1562 double 1563 compute_A2(double z[], int n) 1564 { 1565 int i; 1566 double sum = 0.0; 1568 if ( n < 5 ) 1569 /* Too few values. */ 1570 return -1.0; 1572 /* If any of the values are outside the range (0, 1) then 1573 * fail immediately (and avoid a possible floating point 1574 * exception in the code below). 1575 */ 1576 for (i = 0; i < n; ++i) 1577 if ( z[i] <= 0.0 || z[i] >= 1.0 ) 1578 return -1.0; 1580 /* Page 101 of D'Agostino and Stephens. */ 1581 for (i = 1; i <= n; ++i) { 1582 sum += (2 * i - 1) * log(z[i-1]); 1583 sum += (2 * n + 1 - 2 * i) * log(1.0 - z[i-1]); 1584 } 1585 return -n - (1.0 / n) * sum; 1586 } 1588 double 1589 A2_significance(double A2) 1590 { 1591 /* Page 105 of D'Agostino and Stephens. */ 1592 if (A2 < 0.0) 1593 return A2; /* Bogus A2 value - propagate it. */ 1595 /* Check for possibly doctored values. */ 1596 if (A2 <= 0.201) 1597 return 0.99; 1598 else if (A2 <= 0.240) 1599 return 0.975; 1601 ID Framework for IP Performance Metrics November 1997 1603 else if (A2 <= 0.283) 1604 return 0.95; 1605 else if (A2 <= 0.346) 1606 return 0.90; 1607 else if (A2 <= 0.399) 1608 return 0.85; 1610 /* Now check for possible inconsistency. */ 1611 if (A2 <= 1.248) 1612 return 0.25; 1613 else if (A2 <= 1.610) 1614 return 0.15; 1615 else if (A2 <= 1.933) 1616 return 0.10; 1617 else if (A2 <= 2.492) 1618 return 0.05; 1619 else if (A2 <= 3.070) 1620 return 0.025; 1621 else if (A2 <= 3.880) 1622 return 0.01; 1623 else if (A2 <= 4.500) 1624 return 0.005; 1625 else if (A2 <= 6.000) 1626 return 0.001; 1627 else 1628 return 0.0; 1629 } 1631 double 1632 exp_A2_known_mean(double x[], int n, double mean) 1633 { 1634 int i; 1635 double A2; 1637 /* Sort the first n values. */ 1638 qsort(x, n, sizeof(x[0]), compare_double); 1640 /* Assuming they match an exponential distribution, transform 1641 * them to Unif(0,1). 1642 */ 1643 for (i = 0; i < n; ++i) { 1644 x[i] = 1.0 - exp(-x[i] / mean); 1645 } 1647 /* Now make the A^2 test to see if they're truly uniform. */ 1648 A2 = compute_A2(x, n); 1649 return A2_significance(A2); 1650 } 1652 ID Framework for IP Performance Metrics November 1997 1654 double 1655 unif_A2_known_range(double x[], int n, double min_val, double max_val) 1656 { 1657 int i; 1658 double A2; 1659 double range = max_val - min_val; 1661 /* Sort the first n values. */ 1662 qsort(x, n, sizeof(x[0]), compare_double); 1664 /* Transform Unif(min_val, max_val) to Unif(0,1). */ 1665 for (i = 0; i < n; ++i) 1666 x[i] = (x[i] - min_val) / range; 1668 /* Now make the A^2 test to see if they're truly uniform. */ 1669 A2 = compute_A2(x, n); 1670 return A2_significance(A2); 1671 } 1673 double 1674 random_exponential(double mean) 1675 { 1676 return -mean * log1p(-drand48()); 1677 } 1679 18. References 1681 [AK96] G. Almes and S. Kalidindi, "A One-way Delay Metric for IPPM", 1682 Internet Draft , November 1997. 1684 [BM92] I. Bilinskis and A. Mikelsons, Randomized Signal Processing, 1685 Prentice Hall International, 1992. 1687 [DS86] R. D'Agostino and M. Stephens, editors, Goodness-of-Fit Tech- 1688 niques, Marcel Dekker, Inc., 1986. 1690 [CPB93] K. Claffy, G. Polyzos, and H-W. Braun, ``Application of Sam- 1691 pling Methodologies to Network Traffic Characterization,'' Proc. 1692 SIGCOMM '93, pp. 194-203, San Francisco, September 1993. 1694 [FJ94] S. Floyd and V. Jacobson, ``The Synchronization of Periodic 1695 Routing Messages,'' IEEE/ACM Transactions on Networking, 2(2), pp. 1696 122-136, April 1994. 1698 [Mi92] D. Mills, "Network Time Protocol (Version 3) Specification, 1699 Implementation and Analysis", RFC 1305, March 1992. 1701 ID Framework for IP Performance Metrics November 1997 1703 [Pa94] V. Paxson, ``Empirically-Derived Analytic Models of Wide-Area 1704 TCP Connections,'' IEEE/ACM Transactions on Networking, 2(4), pp. 1705 316-336, August 1994. 1707 [Pa96] V. Paxson, ``Towards a Framework for Defining Internet Perfor- 1708 mance Metrics,'' Proceedings of INET '96, 1709 ftp://ftp.ee.lbl.gov/papers/metrics-framework-INET96.ps.Z 1711 [Pa97] V. Paxson, ``Measurements and Analysis of End-to-End Internet 1712 Dynamics,'' Ph.D. dissertation, U.C. Berkeley, 1997, 1713 ftp://ftp.ee.lbl.gov/papers/vp-thesis/dis.ps.gz. 1715 19. Authors' Addresses 1717 Vern Paxson 1718 MS 50B/2239 1719 Lawrence Berkeley National Laboratory 1720 University of California 1721 Berkeley, CA 94720 1722 USA 1723 Phone: +1 510/486-7504 1725 Guy Almes 1726 Advanced Network & Services, Inc. 1727 200 Business Park Drive 1728 Armonk, NY 10504 1729 USA 1730 Phone: +1 914/273-7863 1732 Jamshid Mahdavi 1733 Pittsburgh Supercomputing Center 1734 4400 5th Avenue 1735 Pittsburgh, PA 15213 1736 USA 1737 Phone: +1 412/268-6282 1739 Matt Mathis 1740 Pittsburgh Supercomputing Center 1741 4400 5th Avenue 1742 Pittsburgh, PA 15213 1743 USA 1744 Phone: +1 412/268-3319