idnits 2.17.1 

draft-ietf-ippm-framework-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-24) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 35
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 36 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Abstract section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 673 instances of weird spacing in the document.  Is it really
     formatted ragged-right, rather than justified?

  ** There are 2 instances of too long lines in the document, the longest one
     being 1 character in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 1170: '...Any such summary MUST include  the  sa...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 13 has weird spacing: '... Drafts  are  ...'

  == Line 14 has weird spacing: '...cuments  of  t...'

  == Line 15 has weird spacing: '...ups may  also ...'

  == Line 23 has weird spacing: '... please  check...'

  == Line 25 has weird spacing: '...ctories  on  f...'

  == (668 more instances...)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 1997) is 9657 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '0' on line 1662

  == Unused Reference: 'Pa96' is defined on line 1707, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'AK96'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'BM92'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DS86'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CPB93'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ94'

  ** Obsolete normative reference: RFC 1305 (ref. 'Mi92') (Obsoleted by RFC
     5905)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa94'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa96'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa97'


     Summary: 14 errors (**), 0 flaws (~~), 10 warnings (==), 12 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group          V. Paxson, Lawrence Berkeley National Lab
3	Internet Draft                     G. Almes, Advanced Network & Services
4	                             J. Mahdavi, Pittsburgh Supercomputer Center
5	                              M. Mathis, Pittsburgh Supercomputer Center
6	Expiration Date: May 1998                                  November 1997

8	                  Framework for IP Performance Metrics
9	                   <draft-ietf-ippm-framework-01.txt>

11	1. Status of this Memo

13	   This document is an Internet  Draft.   Internet  Drafts  are  working
14	   documents  of  the Internet Engineering Task Force (IETF), its areas,
15	   and its working groups.  Note that other groups may  also  distribute
16	   working documents as Internet Drafts.

18	   Internet Drafts are draft  documents  valid  for  a  maximum  of  six
19	   months, and may be updated, replaced, or obsoleted by other documents
20	   at any time.  It is inappropriate to use Internet Drafts as reference
21	   material or to cite them other than as ``work in progress''.

23	   To learn the current status of any Internet Draft, please  check  the
24	   ``1id-abstracts.txt'' listing contained in the Internet Drafts shadow
25	   directories  on  ftp.is.co.za   (Africa),   nic.nordu.net   (Europe),
26	   munnari.oz.au  (Pacific  Rim),  ds.internic.net  (US  East Coast), or
27	   ftp.isi.edu (US West Coast).

29	   This memo provides information for the Internet community.  This memo
30	   does  not  specify an Internet standard of any kind.  Distribution of
31	   this memo is unlimited.

33	2. Introduction

35	   The purpose of this  memo  is  to  define  a  general  framework  for
36	   particular  metrics  to  be  developed  by  the IETF's IP Performance
37	   Metrics effort, begun by the Benchmarking Methodology  Working  Group
38	   (BMWG)  of  the Operational Requirements Area, and being continued by
39	   the IP Performance Metrics Working  Group  (IPPM)  of  the  Transport
40	   Area.

42	   We begin by laying out several  criteria  for  the  metrics  that  we
43	   adopt.   These  criteria  are designed to promote an IPPM effort that
44	   will maximize an accurate common understanding by Internet users  and
45	   Internet  providers  of  the performance and reliability both of end-
46	   to-end paths through the Internet and of specific  'IP  clouds'  that

48	ID                Framework for IP Performance Metrics     November 1997

50	   comprise portions of those paths.

52	   We next define some Internet vocabulary that will allow us  to  speak
53	   clearly about Internet components such as routers, paths, and clouds.

55	   We then define the fundamental concepts of 'metric' and  'measurement
56	   methodology',  which  allow  us  to  speak  clearly about measurement
57	   issues.  Given these concepts, we proceed to  discuss  the  important
58	   issue  of  measurement  uncertainties  and errors, and develop a key,
59	   somewhat subtle notion of how they relate to the analytical framework
60	   shared  by  many  aspects of the Internet engineering discipline.  We
61	   then introduce the notion of empirically defined metrics, and  finish
62	   this  part  of  the document with a general discussion of how metrics
63	   can be 'composed'.

65	   The remainder of the document deals with a variety of issues  related
66	   to  defining  sound  metrics  and  methodologies:   how  to deal with
67	   imperfect clocks; the notion of 'wire time' as  distinct  from  'host
68	   time';  how  to  aggregate sets of singleton metrics into samples and
69	   derive sound statistics from those samples; why it is recommended  to
70	   avoid thinking about Internet properties in probabilistic terms (such
71	   as the probability that a packet is dropped), since these terms often
72	   include  implicit  assumptions  about  how  the  network behaves; the
73	   utility of defining metrics in terms of packets of  a  generic  type;
74	   the  benefits  of  preferring IP addresses to DNS host names; and the
75	   notion of  'standard-formed'  packets.   An  appendix  discusses  the
76	   Anderson-Darling  test  for gauging whether a set of values matches a
77	   given  statistical  distribution,   and   gives   C   code   for   an
78	   implementation of the test.

80	   In some sections of the memo, we will surround some  commentary  text
81	   with the brackets {Comment: ... }.  We stress that this commentary is
82	   only commentary, and is not itself part of the framework document  or
83	   a proposal of particular metrics.  In some cases this commentary will
84	   discuss some of the properties of metrics that might  be  envisioned,
85	   but  the  reader  should  assume that any such discussion is intended
86	   only to shed light on points made in the framework document, and  not
87	   to suggest any specific metrics.

89	ID                Framework for IP Performance Metrics     November 1997

91	3. Criteria for IP Performance Metrics

93	   The overarching goal of the  IP  Performance  Metrics  effort  is  to
94	   achieve  a  situation  in  which  users  and  providers  of  Internet
95	   transport service  have  an  accurate  common  understanding  of  the
96	   performance  and  reliability of the Internet component 'clouds' that
97	   they use/provide.

99	   To achieve  this,  performance  and  reliability  metrics  for  paths
100	   through  the  Internet  must  be developed.  In several IETF meetings
101	   criteria for these metrics have been specified:
102	 +    The metrics must be concrete and well-defined,
103	 +    A methodology for a metric should have the  property  that  it  is
104	      repeatable:  if  the  methodology  is  used  multiple  times under
105	      identical conditions, the same measurements should result  in  the
106	      same measurements.
107	 +    The metrics must exhibit no bias for IP  clouds  implemented  with
108	      identical technology,
109	 +    The metrics must exhibit understood and fair bias  for  IP  clouds
110	      implemented with non-identical technology,
111	 +    The metrics must be useful to users and providers in understanding
112	      the performance they experience or provide,
113	 +    The metrics must avoid inducing artificial performance goals.

115	4. Terminology for Paths and Clouds

117	   The following list defines terms that  need  to  be  precise  in  the
118	   development  of  path  metrics.  We begin with low-level notions of a
119	   path into relevant pieces.

121	   host A computer capable of communicating using  the  Internet  proto-
122	        cols; includes "routers".

124	   link A single link-level connection  between  two  (or  more)  hosts;
125	        includes leased lines, ethernets, frame relay clouds, etc.

127	   routerA host which facilitates  network-level  communication  between
128	        hosts by forwarding IP packets.

130	   path A sequence of the form < h0, l1, h1, ..., ln, hn >, where  n  >=
131	        0,  each  hi  is  a host, each li is a link between hi-1 and hi,
132	        each h1...hn-1 is a router.  A pair <li, hi> is termed a  'hop'.
133	        In  an  appropriate  operational  configuration,  the  links and
134	        routers in the path facilitate  network-layer  communication  of
135	        packets  from h0 to hn.  Note that path is a unidirectional con-
136	        cept.

138	ID                Framework for IP Performance Metrics     November 1997

140	   subpath
141	        Given a path, a subpath is any subsequence  of  the  given  path
142	        which  is itself a path.  (Thus, the first and last element of a
143	        subpath is a host.)

145	   cloudAn undirected (possibly cyclic) graph whose vertices are routers
146	        and  whose  edges are links that connect pairs of routers.  For-
147	        mally, ethernets, frame relay clouds, and other links that  con-
148	        nect  more  than  two  routers  are  modelled as fully-connected
149	        meshes of graph edges.  Note that to connect to a cloud means to
150	        connect  to  a router of the cloud over a link; this link is not
151	        itself part of the cloud.

153	   exchange
154	        A special case of a link, an exchange directly connects either a
155	        host to a cloud and/or one cloud to another cloud.

157	   cloud subpath
158	        A subpath of a given path, all of whose hosts are routers  of  a
159	        given cloud.

161	   path digest
162	        A sequence of the form < h0, e1, C1, ..., en, hn >, where  n  >=
163	        0,  h0 and hn are hosts, each e1 ... en is an exchange, and each
164	        C1 ... Cn-1 is a cloud subpath.

166	5. Fundamental Concepts

168	5.1. Metrics

170	   In the operational Internet, there are several quantities related  to
171	   the  performance  and  reliability  of the Internet that we'd like to
172	   know the value of.  When such a quantity is carefully  specified,  we
173	   term  the  quantity  a  metric.   We  anticipate  that  there will be
174	   separate RFCs for each metric (or for each closely related  group  of
175	   metrics).

177	   In some cases, there might be no obvious means to effectively measure
178	   the metric; this is allowed, and even understood to be very useful in
179	   some cases.  It is required, however, that the specification  of  the
180	   metric  be  as  clear as possible about what quantity is being speci-
181	   fied.   Thus,  difficulty  in  practical  measurement  is   sometimes
182	   allowed, but ambiguity in meaning is not.

184	   Each metric will be defined in terms of standard  units  of  measure-
185	   ment.   The  international  metric  system  will  be  used,  with the

187	ID                Framework for IP Performance Metrics     November 1997

189	   following points specifically noted:
190	 +    When a unit is expressed in simple meters (for distance/length) or
191	      seconds   (for  duration),  appropriate  related  units  based  on
192	      thousands or  thousandths  of  acceptable  units  are  acceptable.
193	      Thus,  distances expressed in kilometers (km), durations expressed
194	      in milliseconds (ms), or microseconds (us) are  allowed,  but  not
195	      centimeters  (because  the  prefix is not in terms of thousands or
196	      thousandths).
197	 +    When a unit is expressed in a combination  of  units,  appropriate
198	      related  units  based  on  thousands  or thousandths of acceptable
199	      units are acceptable, but all such thousands/thousandths  must  be
200	      grouped  at the beginning.  Thus, kilo-meters per second (km/s) is
201	      allowed, but meters per millisecond is not.
202	 +    The unit of information is the bit.
203	 +    When metric prefixes are  used  with  bits  or  with  combinations
204	      including  bits,  those  prefixes  will  have their metric meaning
205	      (related to decimal 1000), and not the meaning  conventional  with
206	      computer  storage  (related  to  decimal  1024).   In any RFC that
207	      defines a metric whose units include bits, this convention will be
208	      followed and will be repeated to ensure clarity for the reader.
209	 +    When a time is given, it will be expressed in UTC.
210	   Note that these points apply to the specifications  for  metrics  and
211	   not,  for example, to packet formats where octets will likely be used
212	   in preference/addition to bits.

214	   Finally, we note that some metrics may be defined purely in terms  of
215	   other metrics; such metrics are call 'derived metrics'.

217	5.2. Measurement Methodology

219	   For a given set of well-defined metrics, a number of  distinct  meas-
220	   urement methodologies may exist.  A partial list includes:
221	 +    Direct measurement of a performance  metric  using  injected  test
222	      traffic.   Example:  measurement  of the round-trip delay of an IP
223	      packet of a given size over a given route at a given time.
224	 +    Projection of a metric from  lower-level  measurements.   Example:
225	      given accurate measurements of propagation delay and bandwidth for
226	      each step along a path, projection of the complete delay  for  the
227	      path for an IP packet of a given size.
228	 +    Estimation of a consituent metric from a set  of  more  aggregated
229	      measurements.  Example: given accurate measurements of delay for a
230	      given one-hop path for IP packets of different  sizes,  estimation
231	      of propagation delay for the link of that one-hop path.

233	ID                Framework for IP Performance Metrics     November 1997

235	 +    Estimation of a given metric at one time from  a  set  of  related
236	      metrics at other times.  Example: given an accurate measurement of
237	      flow capacity at a past time, together  with  a  set  of  accurate
238	      delay  measurements  for  that past time and the current time, and
239	      given a model of flow dynamics, estimate the  flow  capacity  that
240	      would be observed at the current time.
241	   This list is by no means exhaustive.  The purpose is to point out the
242	   variety of measurement techniques.

244	   When a given metric is specified, a given measurement approach  might
245	   be noted and discussed.  That approach, however, is not formally part
246	   of the specification.

248	   A methodology for a metric  should  have  the  property  that  it  is
249	   repeatable: if the methodology is used multiple times under identical
250	   conditions, it should result in consistent measurements.

252	   Backing off a little from the word 'identical' in the previous  para-
253	   graph, we could more accurately use the word 'continuity' to describe
254	   a property of a given methodology: a methodology for a  given  metric
255	   exhibits  continuity  if,  for  small  variations  in  conditions, it
256	   results in small variations in the resulting measurements.   Slightly
257	   more  precisely,  for every positive epsilon, there exists a positive
258	   delta, such that if two sets of conditions are within delta  of  each
259	   other, then the resulting measurements will be within epsilon of each
260	   other.  At this point, this should be taken as  a  heuristic  driving
261	   our  intuition about one kind of robustness property rather than as a
262	   precise notion.

264	   A metric that has at least one methodology that  exhibits  continuity
265	   is said itself to exhibit continuity.

267	   Note that some metrics, such as hop-count along a path, are  integer-
268	   valued  and  therefore  cannot  exhibit continuity in quite the sense
269	   given above.

271	   Note further that, in practice, it may not be practical to  know  (or
272	   be  able  to  quantify) the conditions relevant to a measurement at a
273	   given time.  For example, since the instantaneous load (in packets to
274	   be  served)  at  a given router in a high-speed wide-area network can
275	   vary widely over relatively brief periods and will be very  hard  for
276	   an  external  observer  to  quantify,  various  statistics of a given
277	   metric may be more repeatable, or may better exhibit continuity.   In
278	   that  case  those  particular statistics should be specified when the
279	   metric is specified.

281	   Finally, some measurement methodologies may be 'conservative' in  the
282	   sense  that  the act of measurement does not modify, or only slightly

284	ID                Framework for IP Performance Metrics     November 1997

286	   modifies,  the  value  of  the  performance  metric  the  methodology
287	   attempts to measure.  {Comment: for example, in a wide-are high-speed
288	   network under modest load, a test using several small 'ping'  packets
289	   to  measure  delay  would  likely not interfere (much) with the delay
290	   properties of that network as observed by others.  The  corresponding
291	   statement  about  tests  using  a large flow to measure flow capacity
292	   would likely fail.}

294	5.3. Measurements, Uncertainties, and Errors

296	   Even the very best measurement methodologies for the very  most  well
297	   behaved metrics will exhibit errors.  Those who develop such measure-
298	   ment methodologies, however, should strive to:
299	 +    minimize their uncertainties/errors,
300	 +    understand and document the sources of uncertainty/error, and
301	 +    quantify the amounts of uncertainty/error.

303	   For example, when developing a method for measuring delay, understand
304	   how  any errors in your clocks introduce errors into your delay meas-
305	   urement, and quantify this effect as well as you can.  In some cases,
306	   this  will  result  in a requirement that a clock be at least up to a
307	   certain quality if it is to be used to make a certain measurement.

309	   As a second example, consider the timing  error  due  to  measurement
310	   overheads  within  the computer making the measurement, as opposed to
311	   delays due to the Internet component being measured.  The former is a
312	   measurement  error, while the latter reflects the metric of interest.
313	   Note that one technique that can help avoid this overhead is the  use
314	   of  a  packet  filter/sniffer,  running  on  a separate computer that
315	   records network packets and timestamps them accurately (see the  dis-
316	   cussion  of  'wire  time'  below).   The  resulting trace can then be
317	   analysed to assess the test traffic, minimising the effect  of  meas-
318	   urement  host  delays,  or  at  least  allowing  those  delays  to be
319	   accounted for.  We note that this technique may prove beneficial even
320	   if  the  packet filter/sniffer runs on the same machine, because such
321	   measurements generally provide 'kernel-level' timestamping as opposed
322	   to less-accurate 'application-level' timestamping.

324	   Finally, we note that derived metrics (defined above) or metrics that
325	   exhibit spatial or temporal composition (defined below) offer partic-
326	   ular occasion for the analysis of measurement  uncertainties,  namely
327	   how  the uncertainties propagate (conceptually) due to the derivation
328	   or composition.

330	ID                Framework for IP Performance Metrics     November 1997

332	6. Metrics and the Analytical Framework

334	   As the Internet has evolved from the early  packet-switching  studies
335	   of the 1960s, the Internet engineering community has evolved a common
336	   analytical framework of concepts.  This analytical framework,  or  A-
337	   frame,  used  by  designers  and  implementers of protocols, by those
338	   involved in measurement, and by those who study computer network per-
339	   formance using the tools of simulation and analysis, has great advan-
340	   tage to our work.  A major objective  here  is  to  generate  network
341	   characterizations  that are consistent in both analytical and practi-
342	   cal settings, since this will maximize the chances that non-empirical
343	   network  study can be better correlated with, and used to further our
344	   understanding of, real network behavior.

346	   Whenever possible, therefore, we would like to develop  and  leverage
347	   off  of  the  A-frame.   Thus,  whenever  a metric to be specified is
348	   understood to be closely related to concepts within the  A-frame,  we
349	   will attempt to specify the metric in the A-frame's terms.  In such a
350	   specification we will develop the A-frame by precisely  defining  the
351	   concepts  needed  for the metric, then leverage off of the A-frame by
352	   defining the metric in terms of those concepts.

354	   Such a metric will be called an 'analytically specified  metric'  or,
355	   more simply, an analytical metric.

357	   {Comment: Examples of such analytical metrics might include:

359	propagation time of a link
360	     The time, in seconds, required by a single bit to travel  from  the
361	     output  port  on  one Internet host across a single link to another
362	     Internet host.

364	bandwidth of a link for packets of size k
365	     The capacity, in bits/second, where  only  those  bits  of  the  IP
366	     packet are counted, for packets of size k bytes.

368	routeThe path, as defined in Section 4, from A to B at a given time.

370	hop count of a route
371	     The value 'n' of the route path.
372	     }

374	     Note that we make no a priori list of just  what  A-frame  concepts
375	     will  emerge in these specifications, but we do encourage their use
376	     and urge that they be carefully specified so that, as  our  set  of
377	     metrics  develops,  so  will  a  specified  set of A-frame concepts
378	     technically consistent with each other and consonent with the  com-
379	     mon  understanding  of  those  concepts within the general Internet

381	ID                Framework for IP Performance Metrics     November 1997

383	     community.

385	     These A-frame concepts will be intended  to  abstract  from  actual
386	     Internet components in such a way that:
387	 +    the essential function of the component is retained,
388	 +    properties of the component relevant to  the  metrics  we  aim  to
389	      create are retained,
390	 +    a subset of these component properties are potentially defined  as
391	      analytical metrics, and
392	 +    those properties of actual Internet  components  not  relevant  to
393	      defining the metrics we aim to create are dropped.

395	   For example, when considering a router in the context of packet  for-
396	   warding, we might model the router as a component that receives pack-
397	   ets on an input link, queues them on a FIFO packet  queue  of  finite
398	   size,  employs  tail-drop when the packet queue is full, and forwards
399	   them on an output link.  The transmission speed (in  bits/second)  of
400	   the  input  and output links, the latency in the router (in seconds),
401	   and the maximum size of the  packet  queue  (in  bits)  are  relevant
402	   analytical metrics.

404	   In some cases, such analytical metrics used in relation to  a  router
405	   will  be  very closely related to specific metrics of the performance
406	   of Internet paths.  For example, an obvious formula (L + P/B) involv-
407	   ing the latency in the router (L), the packet size (in bits) (P), and
408	   the transmission speed of the output link (B) might closely  approxi-
409	   mate  the  increase  in  packet delay due to the insertion of a given
410	   router along a path.

412	   We stress, however, that well-chosen and well-specified A-frame  con-
413	   cepts  and  their analytical metrics will support more general metric
414	   creation efforts in less obvious ways.

416	   {Comment: for example, when considering the flow capacity of a  path,
417	   it may be of real value to be able to model each of the routers along
418	   the path as packet forwarders as above.   Techniques  for  estimating
419	   the  flow  capacity of a path might use the maximum packet queue size
420	   as a parameter in decidedly non-obvious ways.  For  example,  as  the
421	   maximum  queue  size  increases, so will the ability of the router to
422	   continuously move traffic along an output link  despite  fluctuations
423	   in  traffic  from  an input link.  Estimating this increase, however,
424	   remains a research topic.}

426	   Note that, when we specify A-frame concepts and  analytical  metrics,
427	   we  will  inevitably  make  simplifying assumptions.  The key role of
428	   these concepts is to abstract the properties  of  the  Internet  com-
429	   ponents  relevant  to  given metrics.  Judgement is required to avoid
430	   making assumptions that bias the modeling and  metric  effort  toward

432	ID                Framework for IP Performance Metrics     November 1997

434	   one kind of design.

436	   {Comment: for example, routers might not use tail-drop,  even  though
437	   tail-drop might be easier to model analytically.}

439	   Finally, note that different elements of the A-frame might well  make
440	   different simplifying assumptions.  For example, the abstraction of a
441	   router used to further the definition of path delay might  treat  the
442	   router's  packet queue as a single FIFO queue, but the abstraction of
443	   a router used to further the definition of the handling of  an  RSVP-
444	   enabled  packet  might  treat the router's packet queue as supporting
445	   bounded delay -- a contradictory assumption.  This is not to say that
446	   we make contradictory assumptions at the same time, but that two dif-
447	   ferent parts of our work might refine the simpler base concept in two
448	   divergent ways for different purposes.

450	   {Comment: in more mathematical terms, we would say that  the  A-frame
451	   taken  as  a  whole need not be consistent; but the set of particular
452	   A-frame elements used to define a particular metric must be.}

454	7. Empirically Specified Metrics

456	   There are useful performance and reliability metrics that do not  fit
457	   so  neatly  into  the  A-frame, usually because the A-frame lacks the
458	   detail or power for dealing with them.  For example, "the  best  flow
459	   capacity  achievable  along  a  path using an RFC-2001-compliant TCP"
460	   would be good to be able to measure, but we have no analytical frame-
461	   work of sufficient richness to allow us to cast that flow capacity as
462	   an analytical metric.

464	   These notions can still be well specified  by  instead  describing  a
465	   reference methodology for measuring them.

467	   Such a metric will be called an 'empirically  specified  metric',  or
468	   more simply, an empirical metric.

470	   Such empirical metrics should have three properties:
471	 +    we should have a clear definition for each in  terms  of  Internet
472	      components,
473	 +    we should have at least one effective means to measure them, and
474	 +    to the extent possible, we should have an (necessarily incomplete)
475	      understanding of the metric in terms of the A-frame so that we can
476	      use our measurements to reason about the performance and reliabil-
477	      ity  of  A-frame  components  and  of aggregations of A-frame com-
478	      ponents.

480	ID                Framework for IP Performance Metrics     November 1997

482	8. Two Forms of Composition

484	8.1. Spatial Composition of Metrics

486	   In some cases, it may be realistic and useful to  define  metrics  in
487	   such a fashion that they exhibit spatial composition.

489	   By spatial  composition,  we  mean  a  characteristic  of  some  path
490	   metrics, in which the metric as applied to a (complete) path can also
491	   be defined for various subpaths, and in which the appropriate A-frame
492	   concepts  for  the  metric  suggest  useful relationships between the
493	   metric applied to these  various  subpaths  (including  the  complete
494	   path,  the  various  cloud  subpaths of a given path digest, and even
495	   single routers along the path).  The effectiveness of spatial  compo-
496	   sition depends:
497	 +    on the usefulness in analysis of these relationships as applied to
498	      the relevant A-frame components, and
499	 +    on the practical use of the corresponding relationships as applied
500	      to metrics and to measurement methodologies.

502	   {Comment: for example, consider some metric for delay of  a  100-byte
503	   packet  across  a path P, and consider further a path digest <h0, e1,
504	   C1, ..., en, hn> of P.  The definition of such a metric might include
505	   a  conjecture  that  the delay across P is very nearly the sum of the
506	   corresponding metric across the exhanges (ei) and clouds (Ci) of  the
507	   given  path  digest.   The definition would further include a note on
508	   how a corresponding relation applies to relevant A-frame  components,
509	   both for the path P and for the exchanges and clouds of the path dig-
510	   est.}

512	   When the definition of a metric includes a conjecture that the metric
513	   across  the  path is related to the metric across the subpaths of the
514	   path, that conjecture constitutes a claim that  the  metric  exhibits
515	   spatial composition.  The definition should then include:
516	 +    the specific conjecture applied to the metric,
517	 +    a justification of the practical utility  of  the  composition  in
518	      terms of making accurate measurements of the metric on the path,
519	 +    a justification of the usefulness of the composition in  terms  of
520	      making analysis of the path using A-frame concepts more effective,
521	      and
522	 +    an analysis of how the conjecture could be incorrect.

524	ID                Framework for IP Performance Metrics     November 1997

526	8.2. Temporal Composition of Formal Models and Empirical Metrics

528	   In some cases, it may be realistic and useful to  define  metrics  in
529	   such a fashion that they exhibit temporal composition.

531	   By temporal composition,  we  mean  a  characteristic  of  some  path
532	   metric, in which the metric as applied to a path at a given time T is
533	   also defined for various times t0 < t1 < ... < tn < T, and  in  which
534	   the appropriate A-frame concepts for the metric suggests useful rela-
535	   tionships between the metric applied at times t0,  ...,  tn  and  the
536	   metric  applied at time T.  The effectiveness of temporal composition
537	   depends:
538	 +    on the usefulness in analysis of these relationships as applied to
539	      the relevant A-frame components, and
540	 +    on the practical use of the corresponding relationships as applied
541	      to metrics and to measurement methodologies.

543	   {Comment: for example, consider a  metric for the expected flow capa-
544	   city  across  a  path P during the five-minute period surrounding the
545	   time T, and suppose further that we have the corresponding values for
546	   each  of  the  four  previous five-minute periods t0, t1, t2, and t3.
547	   The definition of such a metric might include a conjecture  that  the
548	   flow  capacity  at  time  T  can  be estimated from a certain kind of
549	   extrapolation from the values of t0, ..., t3.  The  definition  would
550	   further  include  a  note  on how a corresponding relation applies to
551	   relevant A-frame components.

553	   Note: any (spatial or temporal) compositions involving flow  capacity
554	   are likely to be subtle, and temporal compositions are generally more
555	   subtle than spatial compositions, so  the  reader  should  understand
556	   that the foregoing example is intentionally naive.}

558	   When the definition of a metric includes a conjecture that the metric
559	   across the path at a given time T is related to the metric across the
560	   path for a set of other times, that conjecture  constitutes  a  claim
561	   that the metric exhibits temporal composition.  The definition should
562	   then include:
563	 +    the specific conjecture applied to the metric,
564	 +    a justification of the practical utility  of  the  composition  in
565	      terms  of  making accurate measurements of the metric on the path,
566	      and
567	 +    a justification of the usefulness of the composition in  terms  of
568	      making analysis of the path using A-frame concepts more effective.

570	ID                Framework for IP Performance Metrics     November 1997

572	9. Issues related to Time

574	9.1. Clock Issues

576	   Measurements of time lie at  the  heart  of  many  Internet  metrics.
577	   Because  of this, it will often be crucial when designing a methodol-
578	   ogy for measuring a metric  to  understand  the  different  types  of
579	   errors  and  uncertainties  introduced  by imperfect clocks.  In this
580	   section we define terminology for discussing the  characteristics  of
581	   clocks  and  touch  upon  related measurement issues which need to be
582	   addressed by any sound methodology.

584	   The Network Time Protocol (NTP; RFC 1305) defines a nomenclature  for
585	   discussing  clock  characteristics,  which  we  will  also  use  when
586	   appropriate [Mi92].  The main goal of  NTP  is  to  provide  accurate
587	   timekeeping  over  fairly  long time scales, such as minutes to days,
588	   while for measurement  purposes  often  what  is  more  important  is
589	   short-term accuracy, between the beginning of the measurement and the
590	   end, or over the course of gathering a body of measurements  (a  sam-
591	   ple).   This difference in goals sometimes leads to different defini-
592	   tions of terminology as well, as discussed below.

594	   To begin, we define a clock's "offset" at a particular moment as  the
595	   difference between the time reported by the clock and the "true" time
596	   as defined by UTC.  If the clock reports a time Tc and the true  time
597	   is Tt, then the clock's offset is Tc - Tt.

599	   We will refer to a clock as "accurate" at a particular moment if  the
600	   clock's  offset  is  zero, and more generally a clock's "accuracy" is
601	   how close the absolute value of the offset  is  to  zero.   For  NTP,
602	   accuracy  also  includes  a notion of the frequency of the clock; for
603	   our purposes, we instead incorporate this notion into that of "skew",
604	   because we define accuracy in terms of a single moment in time rather
605	   than over an interval of time.

607	   A clock's "skew" at a particular moment is the  frequency  difference
608	   (first  derivative  of  its offset with respect to true time) between
609	   the clock and true time.

611	   As noted in RFC 1305, real clocks exhibit  some  variation  in  skew.
612	   That  is, the second derivative of the clock's offset with respect to
613	   true time is generally non-zero.  In keeping with RFC 1305, we define
614	   this quantity as the clock's "drift".

616	   A clock's "resolution" is the smallest unit by which the clock's time
617	   is  updated.   It  gives  a  lower  bound on the clock's uncertainty.
618	   (Note that clocks can have very fine resolutions and  yet  be  wildly

620	ID                Framework for IP Performance Metrics     November 1997

622	   inaccurate.)   Resolution  is  defined in terms of seconds.  However,
623	   resolution is relative to the clock's reported time and not  to  true
624	   time,  so for example a resolution of 10 ms only means that the clock
625	   updates its notion of time in 0.01 second increments, not  that  this
626	   is the true amount of time between updates.

628	   {Comment: Systems differ on how an application interface to the clock
629	   reports  the  time on subsequent calls during which the clock has not
630	   advanced.  Some systems simply return  the  same  unchanged  time  as
631	   given  for  previous  calls.  Others may add a small increment to the
632	   reported time to maintain monotonic increasing timestamps.  For  sys-
633	   tems  that do the latter, we do *not* consider these small increments
634	   when defining the clock's resolution.  They are instead an impediment
635	   to assessing the clock's resolution, since a natural method for doing
636	   so is to repeatedly query the clock to determine  the  smallest  non-
637	   zero difference in reported times.}

639	   It is expected that a clock's resolution  changes  only  rarely  (for
640	   example, due to a hardware upgrade).

642	   There are a number of interesting  metrics  for  which  some  natural
643	   measurement  methodologies  involve  comparing  times reported by two
644	   different clocks.  An example is one-way packet delay  (currently  an
645	   Internet  Draft  [AK96]).   Here,  the  time required for a packet to
646	   travel through the network is measured by comparing the time reported
647	   by a clock at one end of the packet's path, corresponding to when the
648	   packet first entered the network, with the time reported by  a  clock
649	   at  the  other end of the path, corresponding to when the packet fin-
650	   ished traversing the network.

652	   We are thus also interested in terminology  for  describing  how  two
653	   clocks  C1  and  C2 compare.  To do so, we introduce terms related to
654	   those above in which the notion of "true time"  is  replaced  by  the
655	   time  as  reported by clock C1.  For example, clock C2's offset rela-
656	   tive to C1 at a particular moment is Tc2  -  Tc1,  the  instantaneous
657	   difference  in  time  reported by C2 and C1.  To disambiguate between
658	   the use of the terms to compare two clocks  versus  the  use  of  the
659	   terms  to  compare  to  true time, we will in the former case use the
660	   phrase "relative".  So the offset defined earlier in  this  paragraph
661	   is the "relative offset" between C2 and C1.

663	   When comparing clocks, the analog of "resolution"  is  not  "relative
664	   resolution",  but instead "joint resolution", which is the sum of the
665	   resolutions of C1 and C2.  The joint resolution then indicates a con-
666	   servative  lower bound on the accuracy of any time intervals computed
667	   by subtracting timestamps generated by one clock from those generated
668	   by the other.

670	ID                Framework for IP Performance Metrics     November 1997

672	   If two clocks are "accurate" with respect to one another (their rela-
673	   tive  offset  is  zero), we will refer to the pair of clocks as "syn-
674	   chronized".  Note that clocks can be highly  synchronized  yet  arbi-
675	   trarily  inaccurate  in  terms of how well they tell true time.  This
676	   point is important because for many Internet measurements, synchroni-
677	   zation  between two clocks is more important than the accuracy of the
678	   clocks.  The is somewhat true of skew, too: as long as  the  absolute
679	   skew  is not too great, then minimal relative skew is more important,
680	   as it can induce systematic trends in packet transit  times  measured
681	   by comparing timestamps produced by the two clocks.

683	   These distinctions arise because for  Internet  measurement  what  is
684	   often most important are differences in time as computed by comparing
685	   the output of two clocks.  The process of  computing  the  difference
686	   removes  any  error  due  to  clock inaccuracies with respect to true
687	   time; but it is crucial that the  differences  themselves  accurately
688	   reflect differences in true time.

690	   Measurement methodologies will often begin with the step of  assuring
691	   that  two  clocks  are  synchronized and have minimal skew and drift.
692	   {Comment: An effective way to assure these conditions (and also clock
693	   accuracy) is by using clocks that derive their notion of time from an
694	   external source, rather than only the host computer's clock.   (These
695	   latter  are  often subject to large errors.) It is further preferable
696	   that the clocks directly derive their time,  for  example  by  having
697	   immediate access to a GPS (Global Positioning System) unit.}

699	   Two important concerns arise if the clocks  indirectly  derive  their
700	   time using a network time synchronization protocol such as NTP:
701	 +    First, NTP's accuracy depends in part on the properties  (particu-
702	      larly  delay)  of  the  Internet  paths used by the NTP peers, and
703	      these might be exactly the properties that we wish to measure,  so
704	      it would be unsound to use NTP to calibrate such measurements.
705	 +    Second, NTP focuses on clock  accuracy,  which  can  come  at  the
706	      expense  of  short-term clock skew and drift.  For example, when a
707	      host's clock is indirectly synchronized to a time source,  if  the
708	      synchronization  intervals  occur infrequently, then the host will
709	      sometimes be faced with the problem of how to adjust its  current,
710	      incorrect  time,  Ti, with a considerably different, more accurate
711	      time it has just learned, Ta.  Two general ways in which  this  is
712	      done  are  to either immediately set the current time to Ta, or to
713	      adjust the local clock's update frequency  (hence,  its  skew)  so
714	      that  at  some  point  in the future the local time Ti' will agree
715	      with the more accurate time Ta'.  The first  mechanism  introduces
716	      discontinuities  and  can  also  violate  common  assumptions that
717	      timestamps are monotone increasing.  If the host's  clock  is  set
718	      backward  in  time, sometimes this can be easily detected.  If the
719	      clock is set forward in time, this can be harder to  detect.   The

721	ID                Framework for IP Performance Metrics     November 1997

723	      skew  induced  by  the  second  mechanism can lead to considerable
724	      inaccuracies when computing  differences  in  time,  as  discussed
725	      above.

727	   To illustrate why skew is a  crucial  concern,  consider  samples  of
728	   one-way  delays  between two Internet hosts made at one minute inter-
729	   vals.  The true transmission delay between the hosts might  plausibly
730	   be  on  the  order of 50 ms for a transcontinental path.  If the skew
731	   between the two clocks is 0.01%, that is,  1  part  in  10,000,  then
732	   after  10  minutes of observation the error introduced into the meas-
733	   urement is 60 ms.  Unless corrected, this error  is  enough  to  com-
734	   pletely  wipe out any accuracy in the transmission delay measurement.
735	   Finally, we note that assessing skew  errors  between  unsynchronized
736	   network  clocks  is an open research area.  (See [Pa97] for a discus-
737	   sion of detecting and compensating for these sorts of  errors.)  This
738	   shortcoming  makes  use  of a solid, independent clock source such as
739	   GPS especially desirable.

741	9.2. The Notion of "Wire Time"

743	   Internet measurement is often complicated  by  the  use  of  Internet
744	   hosts  themselves to perform the measurement.  These hosts can intro-
745	   duce delays, bottlenecks, and the like that are due  to  hardware  or
746	   operating  system  effects  and  have  nothing to do with the network
747	   behavior we would like to  measure.   This  problem  is  particularly
748	   acute  when  timestamping of network events occurs at the application
749	   level.

751	   In order to provide a general way of talking about these effects,  we
752	   introduce two notions of "wire time".  These notions are only defined
753	   in terms of an Internet host H observing an Internet link L at a par-
754	   ticular location:
755	 +    For a given packet P, the 'wire arrival time' of P at H  on  L  is
756	      the  first time T at which any bit of P has appeared at H's obser-
757	      vational position on L.
758	 +    For a given packet P, the 'wire exit time' of P at H on L  is  the
759	      first  time  T  at  which  all  the bits of P have appeared at H's
760	      observational position on L.
761	   Note that intrinsic to the definition is the notion of where  on  the
762	   link  we  are  observing.   This distinction is important because for
763	   large-latency links, we may obtain very different times depending  on
764	   exactly where we are observing the link.  We could allow the observa-
765	   tional position to be an arbitrary location along the link;  however,
766	   we define it to be in terms of an Internet host because we anticipate
767	   in practice that, for IPPM metrics, all  such  timing  will  be  con-
768	   strained  to  be performed by Internet hosts, rather than specialized
769	   hardware devices that might be able to monitor a  link  at  locations

771	ID                Framework for IP Performance Metrics     November 1997

773	   where  a host cannot.  This definition also takes care of the problem
774	   of links that are comprised of multiple physical  channels.   Because
775	   these  multiple channels are not visible at the IP layer, they cannot
776	   be individually observed in terms of the above definitions.

778	   It is possible, though one hopes uncommon, that a packet P might make
779	   multiple  trips  over  a particular link L, due to a forwarding loop.
780	   These trips might even overlap, depending  on  the  link  technology.
781	   Whenever  this occurs, we define a separate wire time associated with
782	   each instance of P seen at H's position on the link.  This definition
783	   is  worth  making  because  it serves as a reminder that notions like
784	   *the* unique time a  packet  passes  a  point  in  the  Internet  are
785	   inherently slippery.

787	   The term wire time has historically been used to loosely  denote  the
788	   time at which a packet appeared on a link, without exactly specifying
789	   whether this refers to the first bit, the last  bit,  or  some  other
790	   consideration.   This  informal  definition is generally already very
791	   useful, as it is usually used to make a distinction between when  the
792	   packet's  propagation delays begin and cease to be due to the network
793	   rather than the endpoint hosts.

795	   When appropriate, metrics should be defined in terms  of  wire  times
796	   rather  than  host  endpoint  times,  so that the metric's definition
797	   highlights the issue of separating delays due to the host from  those
798	   due to the network.

800	   We note that one potential difficulty when dealing  with  wire  times
801	   concerns  IP  fragments.   It may be the case that, due to fragmenta-
802	   tion, only a portion of a particular packet passes by  H's  location.
803	   Such  fragments  are  themselves  legitimate  packets  and have well-
804	   defined wire times associated with them; but  the  larger  IP  packet
805	   corresponding to their aggregate may not.

807	   We also note that these notions have not, to our knowledge, been pre-
808	   viously  defined  in exact terms for Internet traffic.  Consequently,
809	   we may find with  experience  that  these  definitions  require  some
810	   adjustment in the future.

812	   {Comment: It can sometimes be difficult to measure wire  times.   One
813	   technique  is  to  use  a packet filter to monitor traffic on a link.
814	   The architecture of these filters often attempts  to  associate  with
815	   each  packet  a  timestamp as close to the wire time as possible.  We
816	   note however that one common source of error is  to  run  the  packet
817	   filter  on  one  of  the  endpoint  hosts.  In this case, it has been
818	   observed that some packet filters receive for some packets timestamps
819	   corresponding  to when the packet was *scheduled* to be injected into
820	   the network, rather than when it actually was  *sent*  out  onto  the

822	ID                Framework for IP Performance Metrics     November 1997

824	   network  (wire  time).  There can be a substantial difference between
825	   these two times.  A technique for dealing with this problem is to run
826	   the  packet  filter  on  a  separate host that passively monitors the
827	   given link.  This can be problematic however for some link  technolo-
828	   gies.   See  [Pa97]  for  a  discussion of the sorts of errors packet
829	   filters can exhibit.  Finally, we note that packet filters will often
830	   only capture the first fragment of a fragmented IP packet, due to the
831	   use of filtering on fields in the IP and transport protocol  headers.
832	   As  we  generally  desire  our measurement methodologies to avoid the
833	   complexity of creating fragmented traffic, one strategy  for  dealing
834	   with  their  presence  as detected by a packet filter is to flag that
835	   the measured traffic has an unusual form and abandon further analysis
836	   of the packet timing.}

838	10. Singletons, Samples, and Statistics

840	   With experience we have found it useful  to  introduce  a  separation
841	   between three distinct -- yet related -- notions:
842	 +    By a 'singleton' metric, we refer to metrics that are, in a sense,
843	      atomic.   For example, a single instance of "bulk throughput capa-
844	      city" from one host to another might be  defined  as  a  singleton
845	      metric,  even though the instance involves measuring the timing of
846	      a number of Internet packets.
847	 +    By a 'sample' metric, we refer to metrics  derived  from  a  given
848	      singleton   metric  by  taking  a  number  of  distinct  instances
849	      together.  For example, we might define a sample metric of one-way
850	      delays  from  one  host  to another as an hour's worth of measure-
851	      ments, each made at Poisson intervals with a mean spacing  of  one
852	      second.
853	 +    By a 'statistical' metric, we refer  to  metrics  derived  from  a
854	      given  sample  metric  by  computing  some statistic of the values
855	      defined by the singleton metric on the sample.  For  example,  the
856	      mean  of  all  the  one-way delay values on the sample given above
857	      might be defined as a statistical metric.
858	   By applying these notions of singleton, sample, and  statistic  in  a
859	   consistent way, we will be able to reuse lessons learned about how to
860	   define samples and statistics on various metrics.  The  orthogonality
861	   among  these three notions will thus make all our work more effective
862	   and more intelligible by the community.

864	   In the remainder of this section, we will cover some topics  in  sam-
865	   pling  and  statistics that we believe will be important to a variety
866	   of metric definitions and measurement efforts.

868	ID                Framework for IP Performance Metrics     November 1997

870	10.1. Methods of Collecting Samples

872	   The main reason for collecting samples is to see what sort of  varia-
873	   tions  and  consistencies  are  present in the metric being measured.
874	   These variations might be with respect to  different  points  in  the
875	   Internet,  or different measurement times.  When assessing variations
876	   based on a sample, one generally makes an assumption that the  sample
877	   is  "unbiased",  meaning  that the process of collecting the measure-
878	   ments in the sample did not skew the sample  so  that  it  no  longer
879	   accurately reflects the metric's variations and consistencies.

881	   One  common  way  of  collecting  samples  is  to  make  measurements
882	   separated by fixed amounts of time: periodic sampling.  Periodic sam-
883	   pling is particularly attractive because of its  simplicity,  but  it
884	   suffers from two potential problems:
885	 +    If the metric being measured itself  exhibits  periodic  behavior,
886	      then  there  is  a possibility that the sampling will observe only
887	      part of the periodic behavior  if  the  periods  happen  to  agree
888	      (either  directly, or if one is a multiple of the other).  Related
889	      to this problem is the notion that periodic sampling can be easily
890	      anticipated.   Predictable sampling is susceptible to manipulation
891	      if there are mechanisms by which a  network  component's  behavior
892	      can  be  temporarily  changed such that the sampling only sees the
893	      modified behavior.
894	 +    The act of measurement can perturb what  is  being  measured  (for
895	      example,  injecting  measurement traffic into a network alters the
896	      congestion level of the network), and repeated periodic  perturba-
897	      tions  can  drive  a  network into a state of synchronization (cf.
898	      [FJ94]), greatly  magnifying  what  might  individually  be  minor
899	      effects.

901	   A more sound approach is based on "random additive sampling": samples
902	   are  separated by independent, randomly generated intervals that have
903	   a common statistical distribution G(t) [BM92].  The quality  of  this
904	   sampling depends on the distribution G(t).  For example, if G(t) gen-
905	   erates a constant value g with probability  one,  then  the  sampling
906	   reduces to periodic sampling with a period of g.

908	   Random additive sampling gains significant advantages.   In  general,
909	   it  avoids synchronization effects and yields an unbiased estimate of
910	   the property being sampled.  The only significant drawbacks  with  it
911	   are:
912	 +    it complicates frequency-domain analysis, because the  samples  do
913	      not  occur at fixed intervals such as assumed by Fourier-transform
914	      techniques; and

916	ID                Framework for IP Performance Metrics     November 1997

918	 +    unless G(t) is the exponential distribution (see below),  sampling
919	      still remains somewhat predictable, as discussed for periodic sam-
920	      pling above.

922	10.1.1. Poisson Sampling

924	   It can be proved that if G(t) is  an  exponential  distribution  with
925	   rate lambda, that is
926	       G(t) = 1 - exp(-lambda * t)
927	   then the arrival of new samples *cannot* be  predicted  (and,  again,
928	   the  sampling  is unbiased).  Furthermore, the sampling is asymptoti-
929	   cally unbiased even if the act  of  sampling  affects  the  network's
930	   state.   Such  sampling  is referred to as "Poisson sampling".  It is
931	   not prone to inducing synchronization, it can be used  to  accurately
932	   collect  measurements  of  periodic  behavior, and it is not prone to
933	   manipulation by anticipating when new samples will occur.

935	   Because of these valuable properties, we in general prefer that  sam-
936	   ples  of  Internet  measurements are gathered using Poisson sampling.
937	   {Comment: We note, however, that  there  may  be  circumstances  that
938	   favor  use of a different G(t).  For example, the exponential distri-
939	   bution is unbounded, so its use will  on  occasion  generate  lengthy
940	   spaces  between sampling times.  We might instead desire to bound the
941	   longest such interval to a maximum value dT, to speed the convergence
942	   of  the  estimation derived from the sampling.  This could be done by
943	   using
944	       G(t) = Unif(0, dT)
945	   that is, the uniform distribution between 0 and dT.   This  sampling,
946	   of course, becomes highly predictable if an interval of nearly length
947	   dT has elapsed without a sample occurring.}

949	   In its purest form, Poisson sampling is done by  generating  indepen-
950	   dent,  exponentially  distributed  intervals  and  gathering a single
951	   measurement after each interval has elapsed.  It can be shown that if
952	   starting at time T one performs Poisson sampling over an interval dT,
953	   during which a total of N measurements happen to be made, then  those
954	   measurements  will  be  uniformly  distributed  over the interval [T,
955	   T+dT].  So another way of conducting Poisson sampling is to  pick  dT
956	   and  N and generate N random sampling times uniformly over the inter-
957	   val [T, T+dT].  The two approaches are equivalent, except if N and dT
958	   are  externally  known.  In that case, the property of not being able
959	   to predict measurement times is weakened (the other properties  still
960	   hold).   The  N/dT  approach has an advantage that dealing with fixed
961	   values of N and dT can be simpler than dealing with  a  fixed  lambda
962	   but variable numbers of measurements over variably-sized intervals.

964	ID                Framework for IP Performance Metrics     November 1997

966	10.1.2. Geometric Sampling

968	   Closely related to Poisson sampling is "geometric sampling", in which
969	   external  events  are measured with a fixed probability p.  For exam-
970	   ple, one might capture all the packets over a link  but  only  record
971	   the  packet  to a trace file if a randomly generated number uniformly
972	   distributed between 0 and 1 is less than a given p.   Geometric  sam-
973	   pling  has  the same properties of being unbiased and not predictable
974	   in advance as Poisson sampling, so if it fits a  particular  Internet
975	   measurement task, it too is sound.  See [CPB93] for more discussion.

977	10.1.3. Generating Poisson Sampling Intervals

979	   To generate Poisson sampling intervals, one first determines the rate
980	   lambda  at  which  the singleton measurements will on average be made
981	   (e.g., for an average sampling interval of 30 seconds, we have lambda
982	   =  1/30,  if  the  units  of time are seconds).  One then generates a
983	   series of exponentially-distributed (pseudo-)random numbers  E1,  E2,
984	   ...,  En.  The first measurement is made at time E1, the next at time
985	   E1+E2, and so on.

987	   One  technique  for  generating  exponentially-distributed   (pseudo-
988	   )random  numbers is based on the ability to generate U1, U2, ..., Un,
989	   (pseudo-)random numbers that are uniformly distributed between 0  and
990	   1.   Many  computers  provide libraries that can do this.  Given such
991	   Ui, to generate Ei one uses:
992	       Ei = -log(Ui) / lambda
993	   where log(Ui) is the natural logarithm of Ui.  {Comment:  This  tech-
994	   nique  is  an instance of the more general "inverse transform" method
995	   for generating random numbers with a given distribution.}

997	   Implementation details:

999	   There are at least three different methods for approximating  Poisson
1000	   sampling, which we describe here as Methods 1 through 3.  Method 1 is
1001	   the easiest to implement and has the most error, and method 3 is  the
1002	   most  difficult  to  implement  and  has the least error (potentially
1003	   none).

1005	   Method 1 is to proceed as follows:
1006	   1.  Generate E1 and wait that long.
1007	   2.  Perform a measurement.
1008	   3.  Generate E2 and wait that long.
1009	   4.  Perform a measurement.
1010	   5.  Generate E3 and wait that long.
1011	   6.  Perform a measurement ...

1013	ID                Framework for IP Performance Metrics     November 1997

1015	   The problem with this approach is that the  "Perform  a  measurement"
1016	   steps  themselves take time, so the sampling is not done at times E1,
1017	   E1+E2, etc., but rather at E1, E1+M1+E2, etc., where Mi is the amount
1018	   of  time required for the i'th measurement.  If Mi is very small com-
1019	   pared to 1/lambda then the potential error introduced by  this  tech-
1020	   nique  is likewise small.  As Mi becomes a non-negligible fraction of
1021	   1/lambda, the potential error increases.

1023	   Method 2 attempts to correct this error by taking  into  account  the
1024	   amount  of  time  required  by  the measurements (i.e., the Mi's) and
1025	   adjusting the waiting intervals accordingly:
1026	   1.  Generate E1 and wait that long.
1027	   2.  Perform a measurement and measure M1, the time it took to do so.
1028	   3.  Generate E2 and wait for a time E2-M1.
1029	   4.  Perform a measurement and measure M2 ..

1031	   This approach works fine as long as E{i+1} >= Mi.  But if E{i+1} < Mi
1032	   then  it is impossible to wait the proper amount of time.  (Note that
1033	   this case corresponds to needing to perform two  measurements  simul-
1034	   taneously.)

1036	   Method 3 is generating a schedule of  measurement  times  E1,  E1+E2,
1037	   etc., and then sticking to it:
1038	   1.  Generate E1, E2, ..., En.
1039	   2.  Compute measurement times T1, T2, ..., Tn, as Ti = E1 + ... + Ei.
1040	   3.  Arrange that at times T1, T2, ..., Tn, a measurement is made.

1042	   By allowing simultaneous measurements, Method 3 avoids the  shortcom-
1043	   ings  of  Methods  1  and  2.  If, however, simultaneous measurements
1044	   interfere with one another, then Method 3 does not gain  any  benefit
1045	   and may actually prove worse than Methods 1 or 2.

1047	   For Internet phenomena, it is not known to what degree the  inaccura-
1048	   cies  of  these  methods  are significant.  If the Mi's are much less
1049	   than 1/lambda, then any of the three should suffice.  If the Mi's are
1050	   less  than  1/lambda  but  perhaps not greatly less, then Method 2 is
1051	   preferred to Method 1.  If simultaneous measurements do not interfere
1052	   with  one  another, then Method 3 is preferred, though it can be con-
1053	   siderably harder to implement.

1055	10.2. Self-Consistency

1057	   A fundamental requirement for a sound measurement methodology is that
1058	   measurement be made using as few unconfirmed assumptions as possible.
1059	   Experience has painfully shown how easy  it  is  to  make  an  (often
1060	   implicit)  assumption  that turns out to be incorrect.  An example is
1061	   incorporating into a measurement the reading of a clock  synchronized

1063	ID                Framework for IP Performance Metrics     November 1997

1065	   to  a highly accurate source.  It is easy to assume that the clock is
1066	   therefore accurate; but due to software bugs, a loss of power in  the
1067	   source,  or a loss of communication between the source and the clock,
1068	   the clock could actually be quite inaccurate.

1070	   This is not to argue that one must not make  *any*  assumptions  when
1071	   measuring, but rather that, to the extent which is practical, assump-
1072	   tions should be tested.  One  powerful  way  for  doing  so  involves
1073	   checking  for  self-consistency.   Such  checking applies both to the
1074	   observed value(s) of the measurement *and  the  values  used  by  the
1075	   measurement  process itself*.  A simple example of the former is that
1076	   when computing a round trip time, one should check to see  if  it  is
1077	   negative.  Since negative time intervals are non-physical, if it ever
1078	   is negative that finding immediately flags an error.  *These sorts of
1079	   errors should then be investigated!* It is crucial to determine where
1080	   the error lies, because only by doing so diligently can we  build  up
1081	   faith  in  a  methodology's  fundamental  soundness.  For example, it
1082	   could be that the round trip time  is  negative  because  during  the
1083	   measurement  the clock was set backward in the process of synchroniz-
1084	   ing it with another source.  But it could also be that  the  measure-
1085	   ment program accesses uninitialized memory in one of its computations
1086	   and, only very rarely, that  leads  to  a  bogus  computation.   This
1087	   second  error  is more serious, if the same program is used by others
1088	   to perform the same measurement, since then they too will suffer from
1089	   incorrect  results.  Furthermore, once uncovered it can be completely
1090	   fixed.

1092	   A more subtle example of  testing  for  self-consistency  comes  from
1093	   gathering  samples  of  one-way  Internet delays.  If one has a large
1094	   sample of such delays, it may well be highly telling to, for example,
1095	   fit  a line to the pairs of (time of measurement, measured delay), to
1096	   see if the resulting line has a clearly non-zero  slope.   If  so,  a
1097	   possible  interpretation  is that one of the clocks used in the meas-
1098	   urements is skewed relative to the other.  Another interpretation  is
1099	   that the slope is actually due to genuine network effects.  Determin-
1100	   ing which is indeed the case will often be highly illuminating.  (See
1101	   [Pa97] for a discussion of distinguishing between relative clock skew
1102	   and genuine network effects.) Furthermore, if making  this  check  is
1103	   part  of  the methodology, then a finding that the long-term slope is
1104	   very near zero is positive evidence that the measurements  are  prob-
1105	   ably not biased by a difference in skew.

1107	   A final example illustrates checking the measurement  process  itself
1108	   for  self-consistency.  Above we outline Poisson sampling techniques,
1109	   based on generating  exponentially-distributed  intervals.   A  sound
1110	   measurement methodology would include testing the generated intervals
1111	   to see whether they are indeed exponentially distributed (and also to
1112	   see if they suffer from correlation).  In the appendix we discuss and

1114	ID                Framework for IP Performance Metrics     November 1997

1116	   give C code for one such technique, a general-purpose,  well-regarded
1117	   goodness-of-fit test called the Anderson-Darling test.

1119	   Finally, we note that what is truly relevant for Poisson sampling  of
1120	   Internet  metrics  is  often  not when the measurements began but the
1121	   wire times corresponding to the  measurement  process.   These  could
1122	   well  be different, due to complications on the hosts used to perform
1123	   the measurement.  Thus, even  those  with  complete  faith  in  their
1124	   pseudo-random   number   generators  and  subsequent  algorithms  are
1125	   encouraged to consider how they might test the  assumptions  of  each
1126	   measurement procedure as much as possible.

1128	10.3. Defining Statistical Distributions

1130	   One way of describing a collection of measurements (a sample) is as a
1131	   statistical  distribution  --  informally, as percentiles.  There are
1132	   several slightly different ways of doing  so.   In  this  section  we
1133	   define  a  standard  definition  to give uniformity to these descrip-
1134	   tions.

1136	   The "empirical distribution function" (EDF) of a set of scalar  meas-
1137	   urements is a function F(x) which for any x gives the fractional pro-
1138	   portion of the total measurements that were <= x.  If x is less  than
1139	   the  minimum  value  observed,  then  F(x) is 0.  If it is greater or
1140	   equal to the maximum value observed, then F(x) is 1.

1142	   For example, given the 6 measurements:
1143	   -2, 7, 7, 4, 18, -5
1144	   Then F(-8) = 0, F(-5) = 1/6, F(-5.0001) = 0, F(-4.999) = 1/6, F(7)  =
1145	   5/6, F(18) = 1, F(239) = 1.

1147	   Note that we can recover the different measured values and  how  many
1148	   times  each  occurred from F(x) -- no information regarding the range
1149	   in values is lost.  Summarizing measurements using histograms, on the
1150	   other  hand,  in general loses information about the different values
1151	   observed, so the EDF is preferred.

1153	   Using either the EDF or a histogram, however, we do lose  information
1154	   regarding  the order in which the values were observed.  Whether this
1155	   loss is potentially significant will depend on the metric being meas-
1156	   ured.

1158	   We will use the term "percentile" to refer to the smallest value of x
1159	   for  which F(x) >= a given percentage.  So the 50th percentile of the
1160	   example above is 4, since F(4) = 3/6 = 50%; the  25th  percentile  is
1161	   -2,  since  F(-5) = 1/6 < 25%, and F(-2) = 2/6 >= 25%; the 100th per-
1162	   centile is 18; and the 0th percentile is -infinity, as  is  the  15th

1164	ID                Framework for IP Performance Metrics     November 1997

1166	   percentile.

1168	   Care must be taken when using  percentiles  to  summarize  a  sample,
1169	   because  they  can  lend  an unwarranted appearance of more precision
1170	   than is really available.  Any such summary MUST include  the  sample
1171	   size N, because any percentile difference finer than 1/N is below the
1172	   resolution of the sample.

1174	   See [DS86] for more details regarding EDF's.

1176	   We close with a note on the common (and important!) notion of median.
1177	   In  statistics,  the  median  of  a distribution is defined to be the
1178	   point X for which the probability of observing a value <= X is  equal
1179	   to  the  probability  of  observing a value > X.  When estimating the
1180	   median of a set of observations, the estimate depends on whether  the
1181	   number of observations, N, is odd or even:
1182	 +    If N is odd, then the 50th percentile as defined above is used  as
1183	      the estimated median.
1184	 +    If N is even, then the estimated median is the average of the cen-
1185	      tral  two observations; that is, if the observations are sorted in
1186	      ascending order and numbered from 1 to N, where N = 2*K, then  the
1187	      estimated  median is the average of the (K)'th and (K+1)'th obser-
1188	      vations.
1189	   Usually the term "estimated" is dropped from  the  phrase  "estimated
1190	   median" and this value is simply referred to as the "median".

1192	10.4. Testing For Goodness-of-Fit

1194	   For some forms of measurement calibration we need to test  whether  a
1195	   set  of  numbers  is  consistent with those numbers having been drawn
1196	   from a particular distribution.  An example is that to apply a  self-
1197	   consistency  check  to measurements made using a Poisson process, one
1198	   test is to see whether the spacing between the  sampling  times  does
1199	   indeed  reflect  an exponential distribution; or if the dT/N approach
1200	   discussed above was used, whether the times are uniformly distributed
1201	   across [T, dT].

1203	   {Comment: There are at least three possible sets of values  we  could
1204	   test:  the  scheduled packet transmission times, as determined by use
1205	   of a pseudo-random number generator; user-level timestamps made  just
1206	   before or after the system call for transmitting the packet; and wire
1207	   times for the packets as recorded using a packet filter.   All  three
1208	   of  these  are  potentially  informative:  failures for the scheduled
1209	   times to match an exponential distribution indicate  inaccuracies  in
1210	   the random number generation; failures for the user-level times indi-
1211	   cate inaccuracies in the timers used to  schedule  transmission;  and
1212	   failures  for  the  wire  times  indicate  inaccuracies  in  actually

1214	ID                Framework for IP Performance Metrics     November 1997

1216	   transmitting the packets, perhaps due  to  contention  for  a  shared
1217	   resource.}

1219	   There are a large number of  statistical  goodness-of-fit  techniques
1220	   for  performing  such  tests.   See [DS86] for a thorough discussion.
1221	   That reference recommends the Anderson-Darling EDF test  as  being  a
1222	   good  all-purpose  test,  as  well  as one that is especially good at
1223	   detecting deviations from a given distribution in the lower and upper
1224	   tails of the EDF.

1226	   It is important to understand  that  the  nature  of  goodness-of-fit
1227	   tests  is that one first selects a "significance level", which is the
1228	   probability that the test will erroneously declare that the EDF of  a
1229	   given  set  of  measurements fails to match a particular distribution
1230	   when in fact the measurements do indeed  reflect  that  distribution.
1231	   Unless otherwise stated, IPPM goodness-of-fit tests are done using 5%
1232	   significance.  This means that if the test is applied to 100  samples
1233	   and  5  of those samples are deemed to have failed the test, then the
1234	   samples are all consistent with the distribution  being  tested.   If
1235	   significantly  more of the samples fail the test, then the assumption
1236	   that the samples are consistent with the  distribution  being  tested
1237	   must  be  rejected.   If  significantly fewer of the samples fail the
1238	   test, then the samples have potentially been doctored too well to fit
1239	   the  distribution.   Similarly, some goodness-of-fit tests (including
1240	   Anderson-Darling) can detect whether it is likely that a given sample
1241	   was  doctored.   We also use a significance of 5% for this case; that
1242	   is, the test will report that a given honest sample is "too  good  to
1243	   be true" 5% of the time, so if the test reports this finding signifi-
1244	   cantly more often than one time out of twenty, it  is  an  indication
1245	   that something unusual is occurring.

1247	   The appendix gives sample  C  code  for  implementing  the  Anderson-
1248	   Darling test, as well as further discussing its use.

1250	   See [Pa94] for a discussion of goodness-of-fit  and  closeness-of-fit
1251	   tests in the context of network measurement.

1253	11. Avoiding Stochastic Metrics

1255	   When defining metrics applying to a path, subpath,  cloud,  or  other
1256	   network element, we in general do not define them in stochastic terms
1257	   (probabilities).  We instead prefer a deterministic definition.   So,
1258	   for  example, rather than defining a metric about a "packet loss pro-
1259	   bability between A and B", we would define a metric about  a  "packet
1260	   loss rate between A and B".  (A measurement given by the first defin-
1261	   ition might be "0.73", and by the second "73 packets out of 100".)

1263	ID                Framework for IP Performance Metrics     November 1997

1265	   The reason for this distinction is as follows.  When definitions  are
1266	   made in terms of probabilities, there are often hidden assumptions in
1267	   the definition about a stochastic model of the behavior  being  meas-
1268	   ured.  The fundamental goal with avoiding probabilities in our metric
1269	   definitions is to avoid  biasing  our  definitions  by  these  hidden
1270	   assumptions.

1272	   For example, an easy hidden assumption to make is that packet loss in
1273	   a  network  component  due  to queueing overflows can be described as
1274	   something that happens to any given packet with a  particular  proba-
1275	   bility.   Usually,  however, queueing drops are actually *determinis-
1276	   tic*, and assuming that they should  be  described  probabilistically
1277	   can  obscure  crucial correlations between queueing drops among a set
1278	   of packets.  So it's better to  explicitly  note  stochastic  assump-
1279	   tions, rather than have them sneak into our definitions implicitly.

1281	   This does *not* mean that we abandon  stochastic  models  for  under-
1282	   standing  network  performance!  It  only means that when defining IP
1283	   metrics we avoid terms such as "probability" for terms like  "propor-
1284	   tion"  or "rate".  We will still use, for example, random sampling in
1285	   order to estimate probabilities used by stochastic models related  to
1286	   the  IP metrics.  We also do not rule out the possibility of stochas-
1287	   tic metrics when they are truly appropriate (for example, perhaps  to
1288	   model transmission errors caused by certain types of line noise).

1290	12. Packets of Type P

1292	   A fundamental property of many Internet metrics is that the value  of
1293	   the metric depends on the type of IP packet(s) used to make the meas-
1294	   urement.  Consider an IP-connectivity metric: one  obtains  different
1295	   results  depending  on  whether one is interested in connectivity for
1296	   packets destined for well-known TCP ports or unreserved UDP ports, or
1297	   those with invalid IP checksums, or those with TTL's of 16, for exam-
1298	   ple.   In  some  circumstances  these  distinctions  will  be  highly
1299	   interesting  (for  example,  in  the  presence  of firewalls, or RSVP
1300	   reservations).

1302	   Because of this distinction, we introduce the  generic  notion  of  a
1303	   "packet  of  type  P",  where  in  some contexts P will be explicitly
1304	   defined (i.e., exactly  what  type  of  packet  we  mean),  partially
1305	   defined  (e.g., "with a payload of B octets"), or left generic.  Thus
1306	   we may talk about generic  IP-type-P-connectivity  or  more  specific
1307	   IP-port-HTTP-connectivity.   Some  metrics  and  methodologies may be
1308	   fruitfully defined using generic type P definitions  which  are  then
1309	   made specific when performing actual measurements.

1311	   Whenever a metric's value depends on the type of the packets involved

1313	ID                Framework for IP Performance Metrics     November 1997

1315	   in  the metric, the metric's name will include either a specific type
1316	   or a phrase such as "type-P".   Thus  we  will  not  define  an  "IP-
1317	   connectivity"  metric  but instead an "IP-type-P-connectivity" metric
1318	   and/or perhaps an "IP-port-HTTP-connectivity"  metric.   This  naming
1319	   convention serves as an important reminder that one must be conscious
1320	   of the exact type of traffic being measured.

1322	   A closely related note: it would be very useful to know  if  a  given
1323	   Internet  component  treats  equally  a class C of different types of
1324	   packets.  If so, then any one of those types of packets can  be  used
1325	   for subsequent measurement of the component.  This suggests we devise
1326	   a metric or suite of metrics that attempt to determine C.

1328	13. Internet Addresses vs. Hosts

1330	   When considering a metric for some path through the Internet,  it  is
1331	   often  natural  to think about it as being for the path from Internet
1332	   host H1 to host H2.  A definition in  these  terms,  though,  can  be
1333	   ambiguous,  because  Internet  hosts can be attached to more than one
1334	   network.  In this case, the result of the metric will depend on which
1335	   of these networks is actually used.

1337	   Because of this ambiguitiy, usually such definitions  should  instead
1338	   be defined in terms of Internet IP addresses.  For the common case of
1339	   a unidirectional path through the Internet,  we  will  use  the  term
1340	   "Src"  to  denote  the  IP  address of the beginning of the path, and
1341	   "Dst" to denote the IP address of the end.

1343	14. Standard-Formed Packets

1345	   Unless otherwise stated, all metric definitions that concern IP pack-
1346	   ets  include  an  implicit  assumption  that  the packet is *standard
1347	   formed*.  A packet is standard formed if it meets all of the  follow-
1348	   ing criteria:
1349	 +    Its length as given in the IP header corresponds to  the  size  of
1350	      the IP header plus the size of the payload.
1351	 +    It includes a valid IP header: the version field is 4  (later,  we
1352	      will  expand  this  to  include 6); the header length is >= 5; the
1353	      checksum is correct.
1354	 +    It is not an IP fragment.
1355	 +    The source and destination addresses correspond to  the  hosts  in
1356	      question.

1358	ID                Framework for IP Performance Metrics     November 1997

1360	 +    Either the packet possesses sufficient  TTL  to  travel  from  the
1361	      source to the destination if the TTL is decremented by one at each
1362	      hop, or it possesses the maximum TTL of 255.
1363	 +    It does not contain IP options unless explicitly noted.
1364	 +    If a transport header is present, it too contains a valid checksum
1365	      and other valid fields.
1366	   We further require that if a packet is described as having a  "length
1367	   of B octets", then 0 <= B <= 65535; and if B is the payload length in
1368	   octets, then B <= (65535-IP header size in octets).

1370	   So, for example, one might imagine defining an IP connectivity metric
1371	   as  "IP-type-P-connectivity  for  standard-formed packets with the IP
1372	   TOS field set to 0",  or,  more  succinctly,  "IP-type-P-connectivity
1373	   with  the  IP  TOS  field set to 0", since standard-formed is already
1374	   implied by convention.

1376	   A particular type of standard-formed packet often useful to  consider
1377	   is  the  "minimal  IP packet from A to B" - this is an IP packet with
1378	   the following properties:
1379	 +    It is standard-formed.
1380	 +    Its data payload is 0 octets.
1381	 +    It contains no options.
1382	   (Note that we do not define its protocol field, as  different  values
1383	   may lead to different treatment by the network.)

1385	   When defining IP metrics we keep in mind that no  packet  smaller  or
1386	   simpler  than  this  can be transmitted over a correctly operating IP
1387	   network.

1389	15. Acknowledgements

1391	   The comments of Brian Carpenter,  Bill  Cerveny,  Padma  Krishnaswamy
1392	   Jeff Sedayao and Howard Stanislevic are appreciated.

1394	16. Security Considerations

1396	   This memo raises no security issues.

1398	ID                Framework for IP Performance Metrics     November 1997

1400	17. Appendix

1402	   Below we give routines written  in  C  for  computing  the  Anderson-
1403	   Darling  test  statistic (A2) for determining whether a set of values
1404	   is consistent with a given statistical distribution.  Externally, the
1405	   two main routines of interest are:
1406	       double exp_A2_known_mean(double x[], int n, double mean)
1407	       double unif_A2_known_range(double x[], int n,
1408	                                  double min_val, double max_val)
1409	   Both take as their first argument, x, the array of  n  values  to  be
1410	   tested.   (Upon return, the elements of x are sorted.)  The remaining
1411	   parameters characterize the distribution to be used: either the  mean
1412	   (1/lambda),  for  an exponential distribution, or the lower and upper
1413	   bounds, for a uniform distribution.  The names of the routines stress
1414	   that  these values must be known in advance, and *not* estimated from
1415	   the data (for example, by computing its sample mean).  Estimating the
1416	   parameters from the data *changes* the significance level of the test
1417	   statistic.  While [DS86] gives alternate significance tables for some
1418	   instances  in  which  the parameters are estimated from the data, for
1419	   our purposes we expect that we should indeed know the  parameters  in
1420	   advance,  since  what we will be testing are generally values such as
1421	   packet sending times that we wish to verify follow a known  distribu-
1422	   tion.

1424	   Both routines return a significance level, as described earlier. This
1425	   is  a  value  between 0 and 1.  The correct use of the routines is to
1426	   pick in advance the threshold for the  significance  level  to  test;
1427	   generally,  this will be 0.05, corresponding to 5%, as also described
1428	   above.  Subsequently, if the routines return a  value  strictly  less
1429	   than this threshold, then the data are deemed to be inconsistent with
1430	   the presumed distribution, *subject to an error corresponding to  the
1431	   significance  level*.  That is, for a significance level of 5%, 5% of
1432	   the time data that is indeed drawn  from  the  presumed  distribution
1433	   will be erroneously deemed inconsistent.

1435	   Thus, it is important to bear in mind that if these routines are used
1436	   frequently,  then one will indeed encounter occasional failures, even
1437	   if the data is unblemished.

1439	   Another important point concerning significance levels is that it  is
1440	   unsound  to  compare  them in order to determine which of two sets of
1441	   values is a "better" fit to a presumed  distribution.   Such  testing
1442	   should  instead  be done using "closeness-of-fit metrics" such as the
1443	   lambda^2 metric described in [Pa94].

1445	   While the routines provided are for exponential and uniform distribu-
1446	   tions  with  known  parameters,  it  is generally straight-forward to
1447	   write comparable routines for any distribution with known parameters.

1449	ID                Framework for IP Performance Metrics     November 1997

1451	   The  heart  of  the A2 tests lies in a statistic computed for testing
1452	   whether a set of values is consistent  with  a  uniform  distribution
1453	   between  0  and  1,  which  we  term  Unif(0, 1).  If we wish to test
1454	   whether a set of values, X, is consistent with a  given  distribution
1455	   G(x), we first compute
1456	       Y = G_inverse(X)
1457	   If X is indeed distributed according to G(x), then Y will be  distri-
1458	   buted  according  to Unif(0, 1); so by testing Y for consistency with
1459	   Unif(0, 1), we also test X for consistency with G(x).

1461	   We note, however, that the process of computing Y above  might  yield
1462	   values  of  Y outside the range (0..1).  Such values should not occur
1463	   if X is indeed distributed according to G(x), but easily can occur if
1464	   it  is  not.  In the latter case, we need to avoid computing the cen-
1465	   tral A2 statistic, since floating-point exceptions may occur  if  any
1466	   of  the  values  lie outside (0..1).  Accordingly, the routines check
1467	   for this possiblity, and if encountered, return a raw A2 statistic of
1468	   -1.  The routine that converts the raw A2 statistic to a significance
1469	   level likewise propagates this value, returning a significance  level
1470	   of -1.  So, any use of these routines must be prepared for a possible
1471	   negative significance level.

1473	   The last important point regarding use of A2  statistic  concerns  n,
1474	   the  number  of  values  being tested.  If n < 5 then the test is not
1475	   meaningful, and in this case a significance level of -1 is returned.

1477	   On the other hand, for "real"  data  the  test  *gains*  power  as  n
1478	   becomes  larger.   It  is well known in the statistics community that
1479	   real data almost never exactly matches  a  theoretical  distribution,
1480	   even in cases such as rolling dice a great many times (see [Pa94] for
1481	   a brief discussion and references).  The A2 test is sensitive  enough
1482	   that,  for sufficiently large sets of real data, the test will almost
1483	   always fail, because it will manage to detect slight imperfections in
1484	   the fit of the data to the distribution.

1486	   For example, we have found that  when  testing  8,192  measured  wire
1487	   times  for packets sent at Poisson intervals, the measurements almost
1488	   always fail the A2 test.  On the other hand, testing 128 measurements
1489	   failed  at  5%  significance  only about 5% of the time, as expected.
1490	   Thus, in general, when the test fails, care must be taken  to  under-
1491	   stand why it failed.

1493	   The remainder of this appendix gives C code  for  the  routines  men-
1494	   tioned above.

1496	   /* Routines for computing the Anderson-Darling A2 test statistic.
1497	    *
1498	    * Implememented based on the description in "Goodness-of-Fit

1500	ID                Framework for IP Performance Metrics     November 1997

1502	    * Techniques," R. D'Agostino and M. Stephens, editors,
1503	    * Marcel Dekker, Inc., 1986.
1504	    */

1506	   #include <stdio.h>
1507	   #include <stdlib.h>
1508	   #include <math.h>

1510	   /* Returns the raw A^2 test statistic for n sorted samples
1511	    * z[0] .. z[n-1], for z ~ Unif(0,1).
1512	    */
1513	   extern double compute_A2(double z[], int n);

1515	   /* Returns the significance level associated with a A^2 test
1516	    * statistic value of A2, assuming no parameters of the tested
1517	    * distribution were estimated from the data.
1518	    */
1519	   extern double A2_significance(double A2);

1521	   /* Returns the A^2 significance level for testing n observations
1522	    * x[0] .. x[n-1] against an exponential distribution with the
1523	    * given mean.
1524	    *
1525	    * SIDE EFFECT: the x[0..n-1] are sorted upon return.
1526	    */
1527	   extern double exp_A2_known_mean(double x[], int n, double mean);

1529	   /* Returns the A^2 significance level for testing n observations
1530	    * x[0] .. x[n-1] against the uniform distribution [min_val, max_val].
1531	    *
1532	    * SIDE EFFECT: the x[0..n-1] are sorted upon return.
1533	    */
1534	   extern double unif_A2_known_range(double x[], int n,
1535	                       double min_val, double max_val);

1537	   /* Returns a pseudo-random number distributed according to an
1538	    * exponential distribution with the given mean.
1539	    */
1540	   extern double random_exponential(double mean);

1542	   /* Helper function used by qsort() to sort double-precision
1543	    * floating-point values.
1544	    */
1545	   static int
1546	   compare_double(const void *v1, const void *v2)
1547	   {
1548	       double d1 = *(double *) v1;

1550	ID                Framework for IP Performance Metrics     November 1997

1552	       double d2 = *(double *) v2;

1554	       if (d1 < d2)
1555	           return -1;
1556	       else if (d1 > d2)
1557	           return 1;
1558	       else
1559	           return 0;
1560	   }

1562	   double
1563	   compute_A2(double z[], int n)
1564	   {
1565	       int i;
1566	       double sum = 0.0;

1568	       if ( n < 5 )
1569	           /* Too few values. */
1570	           return -1.0;

1572	       /* If any of the values are outside the range (0, 1) then
1573	        * fail immediately (and avoid a possible floating point
1574	        * exception in the code below).
1575	        */
1576	       for (i = 0; i < n; ++i)
1577	           if ( z[i] <= 0.0 || z[i] >= 1.0 )
1578	               return -1.0;

1580	       /* Page 101 of D'Agostino and Stephens. */
1581	       for (i = 1; i <= n; ++i) {
1582	           sum += (2 * i - 1) * log(z[i-1]);
1583	           sum += (2 * n + 1 - 2 * i) * log(1.0 - z[i-1]);
1584	       }
1585	       return -n - (1.0 / n) * sum;
1586	   }

1588	   double
1589	   A2_significance(double A2)
1590	   {
1591	       /* Page 105 of D'Agostino and Stephens. */
1592	       if (A2 < 0.0)
1593	           return A2;    /* Bogus A2 value - propagate it. */

1595	       /* Check for possibly doctored values. */
1596	       if (A2 <= 0.201)
1597	           return 0.99;
1598	       else if (A2 <= 0.240)
1599	           return 0.975;

1601	ID                Framework for IP Performance Metrics     November 1997

1603	       else if (A2 <= 0.283)
1604	           return 0.95;
1605	       else if (A2 <= 0.346)
1606	           return 0.90;
1607	       else if (A2 <= 0.399)
1608	           return 0.85;

1610	       /* Now check for possible inconsistency. */
1611	       if (A2 <= 1.248)
1612	           return 0.25;
1613	       else if (A2 <= 1.610)
1614	           return 0.15;
1615	       else if (A2 <= 1.933)
1616	           return 0.10;
1617	       else if (A2 <= 2.492)
1618	           return 0.05;
1619	       else if (A2 <= 3.070)
1620	           return 0.025;
1621	       else if (A2 <= 3.880)
1622	           return 0.01;
1623	       else if (A2 <= 4.500)
1624	           return 0.005;
1625	       else if (A2 <= 6.000)
1626	           return 0.001;
1627	       else
1628	           return 0.0;
1629	   }

1631	   double
1632	   exp_A2_known_mean(double x[], int n, double mean)
1633	   {
1634	       int i;
1635	       double A2;

1637	       /* Sort the first n values. */
1638	       qsort(x, n, sizeof(x[0]), compare_double);

1640	       /* Assuming they match an exponential distribution, transform
1641	        * them to Unif(0,1).
1642	        */
1643	       for (i = 0; i < n; ++i) {
1644	           x[i] = 1.0 - exp(-x[i] / mean);
1645	       }

1647	       /* Now make the A^2 test to see if they're truly uniform. */
1648	       A2 = compute_A2(x, n);
1649	       return A2_significance(A2);
1650	   }

1652	ID                Framework for IP Performance Metrics     November 1997

1654	   double
1655	   unif_A2_known_range(double x[], int n, double min_val, double max_val)
1656	   {
1657	       int i;
1658	       double A2;
1659	       double range = max_val - min_val;

1661	       /* Sort the first n values. */
1662	       qsort(x, n, sizeof(x[0]), compare_double);

1664	       /* Transform Unif(min_val, max_val) to Unif(0,1). */
1665	       for (i = 0; i < n; ++i)
1666	           x[i] = (x[i] - min_val) / range;

1668	       /* Now make the A^2 test to see if they're truly uniform. */
1669	       A2 = compute_A2(x, n);
1670	       return A2_significance(A2);
1671	   }

1673	   double
1674	   random_exponential(double mean)
1675	   {
1676	       return -mean * log1p(-drand48());
1677	   }

1679	18. References

1681	   [AK96] G. Almes and S. Kalidindi, "A One-way Delay Metric for  IPPM",
1682	   Internet Draft <draft-ietf-ippm-delay-02.txt>, November 1997.

1684	   [BM92] I. Bilinskis and A. Mikelsons, Randomized  Signal  Processing,
1685	   Prentice Hall International, 1992.

1687	   [DS86] R. D'Agostino and M. Stephens, editors, Goodness-of-Fit  Tech-
1688	   niques, Marcel Dekker, Inc., 1986.

1690	   [CPB93] K. Claffy, G. Polyzos, and H-W. Braun, ``Application of  Sam-
1691	   pling  Methodologies  to  Network  Traffic  Characterization,'' Proc.
1692	   SIGCOMM '93, pp. 194-203, San Francisco, September 1993.

1694	   [FJ94] S. Floyd and V. Jacobson, ``The  Synchronization  of  Periodic
1695	   Routing  Messages,''  IEEE/ACM  Transactions on Networking, 2(2), pp.
1696	   122-136, April 1994.

1698	   [Mi92] D. Mills, "Network Time Protocol  (Version  3)  Specification,
1699	   Implementation and Analysis", RFC 1305, March 1992.

1701	ID                Framework for IP Performance Metrics     November 1997

1703	   [Pa94] V. Paxson, ``Empirically-Derived Analytic Models of  Wide-Area
1704	   TCP  Connections,''  IEEE/ACM  Transactions  on Networking, 2(4), pp.
1705	   316-336, August 1994.

1707	   [Pa96] V. Paxson, ``Towards a Framework for Defining Internet Perfor-
1708	   mance       Metrics,''       Proceedings       of      INET      '96,
1709	   ftp://ftp.ee.lbl.gov/papers/metrics-framework-INET96.ps.Z

1711	   [Pa97] V. Paxson, ``Measurements and Analysis of End-to-End  Internet
1712	   Dynamics,''     Ph.D.     dissertation,    U.C.    Berkeley,    1997,
1713	   ftp://ftp.ee.lbl.gov/papers/vp-thesis/dis.ps.gz.

1715	19. Authors' Addresses

1717	   Vern Paxson <vern@ee.lbl.gov>
1718	   MS 50B/2239
1719	   Lawrence Berkeley National Laboratory
1720	   University of California
1721	   Berkeley, CA  94720
1722	   USA
1723	   Phone: +1 510/486-7504

1725	   Guy Almes <almes@advanced.org>
1726	   Advanced Network & Services, Inc.
1727	   200 Business Park Drive
1728	   Armonk, NY  10504
1729	   USA
1730	   Phone: +1 914/273-7863

1732	   Jamshid Mahdavi <mahdavi@psc.edu>
1733	   Pittsburgh Supercomputing Center
1734	   4400 5th Avenue
1735	   Pittsburgh, PA  15213
1736	   USA
1737	   Phone: +1 412/268-6282

1739	   Matt Mathis <mathis@psc.edu>
1740	   Pittsburgh Supercomputing Center
1741	   4400 5th Avenue
1742	   Pittsburgh, PA  15213
1743	   USA
1744	   Phone: +1 412/268-3319