idnits 2.17.1 

draft-ietf-ippm-framework-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-19) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 37
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 38 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Abstract section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 685 instances of weird spacing in the document.  Is it really
     formatted ragged-right, rather than justified?

  ** There are 2 instances of too long lines in the document, the longest one
     being 1 character in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 13 has weird spacing: '... Drafts  are  ...'

  == Line 14 has weird spacing: '...cuments  of  t...'

  == Line 15 has weird spacing: '...ups may  also ...'

  == Line 23 has weird spacing: '... please  check...'

  == Line 25 has weird spacing: '...ctories  on  f...'

  == (680 more instances...)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 1998) is 9560 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '0' on line 1720

  == Unused Reference: 'Pa96' is defined on line 1765, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'AK96'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'BM92'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DS86'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CPB93'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ94'

  ** Obsolete normative reference: RFC 1305 (ref. 'Mi92') (Obsoleted by RFC
     5905)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa94'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa96'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Pa97'


     Summary: 12 errors (**), 0 flaws (~~), 10 warnings (==), 12 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group          V. Paxson, Lawrence Berkeley National Lab
3	Internet Draft                     G. Almes, Advanced Network & Services
4	                             J. Mahdavi, Pittsburgh Supercomputer Center
5	                              M. Mathis, Pittsburgh Supercomputer Center
6	Expiration Date: July 1998                                 February 1998

8	                  Framework for IP Performance Metrics
9	                   <draft-ietf-ippm-framework-02.txt>

11	1. Status of this Memo

13	   This document is an Internet  Draft.   Internet  Drafts  are  working
14	   documents  of  the Internet Engineering Task Force (IETF), its areas,
15	   and its working groups.  Note that other groups may  also  distribute
16	   working documents as Internet Drafts.

18	   Internet Drafts are draft  documents  valid  for  a  maximum  of  six
19	   months, and may be updated, replaced, or obsoleted by other documents
20	   at any time.  It is inappropriate to use Internet Drafts as reference
21	   material or to cite them other than as ``work in progress''.

23	   To learn the current status of any Internet Draft, please  check  the
24	   ``1id-abstracts.txt'' listing contained in the Internet Drafts shadow
25	   directories  on  ftp.is.co.za   (Africa),   nic.nordu.net   (Europe),
26	   munnari.oz.au  (Pacific  Rim),  ds.internic.net  (US  East Coast), or
27	   ftp.isi.edu (US West Coast).

29	   This memo provides information for the Internet community.  This memo
30	   does  not  specify an Internet standard of any kind.  Distribution of
31	   this memo is unlimited.

33	ID                Framework for IP Performance Metrics     February 1998

35	Table of Contents

37	   1.  STATUS OF THIS MEMO.............................................1
38	   2.  INTRODUCTION....................................................2
39	   3.  CRITERIA FOR IP PERFORMANCE METRICS.............................4
40	   4.  TERMINOLOGY FOR PATHS AND CLOUDS................................4
41	   5.  FUNDAMENTAL CONCEPTS............................................5
42	     5.1  Metrics......................................................5
43	     5.2  Measurement Methodology......................................6
44	     5.2  Measurements, Uncertainties, and Errors......................8
45	   6.  METRICS AND THE ANALYTICAL FRAMEWORK............................9
46	   7.  EMPIRICALLY SPECIFIED METRICS..................................11
47	   8.  TWO FORMS OF COMPOSITION.......................................12
48	     8.1  Spatial Composition of Metrics..............................12
49	     8.2  Temporal Composition of Formal Models and Empirical Metrics.13
50	   9.  ISSUES RELATED TO TIME.........................................14
51	     9.1  Clock Issues................................................14
52	     9.2  The Notion of "Wire Time"...................................17
53	   10. SINGLETONS, SAMPLES, AND STATISTICS............................19
54	     10.1  Methods of Collecting Samples..............................20
55	       10.1.1 Poisson Sampling........................................21
56	       10.1.2 Geometric Sampling......................................22
57	       10.1.3 Generating Poisson Sampling Intervals...................22
58	     10.2  Self-Consistency...........................................23
59	     10.3  Defining Statistical Distributions.........................25
60	     10.4  Testing For Goodness-of-Fit................................26
61	   11. AVOIDING STOCHASTIC METRICS....................................27
62	   12. PACKETS OF TYPE P..............................................28
63	   13. INTERNET ADDRESSES VS. HOSTS...................................29
64	   14. STANDARD-FORMED PACKETS........................................29
65	   15. ACKNOWLEDGEMENTS...............................................30
66	   16. SECURITY CONSIDERATIONS........................................31
67	   17. APPENDIX.......................................................31
68	   18. REFERENCES.....................................................37
69	   19. AUTHORS' ADDRESSES.............................................37

71	2. Introduction

73	   The purpose of this  memo  is  to  define  a  general  framework  for
74	   particular  metrics  to  be  developed  by  the IETF's IP Performance
75	   Metrics effort, begun by the Benchmarking Methodology  Working  Group
76	   (BMWG)  of  the Operational Requirements Area, and being continued by
77	   the IP Performance Metrics Working  Group  (IPPM)  of  the  Transport
78	   Area.

80	   We begin by laying out several  criteria  for  the  metrics  that  we
81	   adopt.   These  criteria  are designed to promote an IPPM effort that

83	ID                Framework for IP Performance Metrics     February 1998

85	   will maximize an accurate common understanding by Internet users  and
86	   Internet  providers  of  the performance and reliability both of end-
87	   to-end paths through the Internet and of specific  'IP  clouds'  that
88	   comprise portions of those paths.

90	   We next define some Internet vocabulary that will allow us  to  speak
91	   clearly about Internet components such as routers, paths, and clouds.

93	   We then define the fundamental concepts of 'metric' and  'measurement
94	   methodology',  which  allow  us  to  speak  clearly about measurement
95	   issues.  Given these concepts, we proceed to  discuss  the  important
96	   issue  of  measurement  uncertainties  and errors, and develop a key,
97	   somewhat subtle notion of how they relate to the analytical framework
98	   shared  by  many  aspects of the Internet engineering discipline.  We
99	   then introduce the notion of empirically defined metrics, and  finish
100	   this  part  of  the document with a general discussion of how metrics
101	   can be 'composed'.

103	   The remainder of the document deals with a variety of issues  related
104	   to  defining  sound  metrics  and  methodologies:   how  to deal with
105	   imperfect clocks; the notion of 'wire time' as  distinct  from  'host
106	   time';  how  to  aggregate sets of singleton metrics into samples and
107	   derive sound statistics from those samples; why it is recommended  to
108	   avoid thinking about Internet properties in probabilistic terms (such
109	   as the probability that a packet is dropped), since these terms often
110	   include  implicit  assumptions  about  how  the  network behaves; the
111	   utility of defining metrics in terms of packets of  a  generic  type;
112	   the  benefits  of  preferring IP addresses to DNS host names; and the
113	   notion of  'standard-formed'  packets.   An  appendix  discusses  the
114	   Anderson-Darling  test  for gauging whether a set of values matches a
115	   given  statistical  distribution,   and   gives   C   code   for   an
116	   implementation of the test.

118	   In some sections of the memo, we will surround some  commentary  text
119	   with the brackets {Comment: ... }.  We stress that this commentary is
120	   only commentary, and is not itself part of the framework document  or
121	   a proposal of particular metrics.  In some cases this commentary will
122	   discuss some of the properties of metrics that might  be  envisioned,
123	   but  the  reader  should  assume that any such discussion is intended
124	   only to shed light on points made in the framework document, and  not
125	   to suggest any specific metrics.

127	ID                Framework for IP Performance Metrics     February 1998

129	3. Criteria for IP Performance Metrics

131	   The overarching goal of the  IP  Performance  Metrics  effort  is  to
132	   achieve  a  situation  in  which  users  and  providers  of  Internet
133	   transport service  have  an  accurate  common  understanding  of  the
134	   performance  and  reliability of the Internet component 'clouds' that
135	   they use/provide.

137	   To achieve  this,  performance  and  reliability  metrics  for  paths
138	   through  the  Internet  must  be developed.  In several IETF meetings
139	   criteria for these metrics have been specified:
140	 +    The metrics must be concrete and well-defined,
141	 +    A methodology for a metric should have the  property  that  it  is
142	      repeatable:  if  the  methodology  is  used  multiple  times under
143	      identical conditions, the same measurements should result  in  the
144	      same measurements.
145	 +    The metrics must exhibit no bias for IP  clouds  implemented  with
146	      identical technology,
147	 +    The metrics must exhibit understood and fair bias  for  IP  clouds
148	      implemented with non-identical technology,
149	 +    The metrics must be useful to users and providers in understanding
150	      the performance they experience or provide,
151	 +    The metrics must avoid inducing artificial performance goals.

153	4. Terminology for Paths and Clouds

155	   The following list defines terms that  need  to  be  precise  in  the
156	   development  of  path  metrics.  We begin with low-level notions of a
157	   path into relevant pieces.

159	   host A computer capable of communicating using  the  Internet  proto-
160	        cols; includes "routers".

162	   link A single link-level connection  between  two  (or  more)  hosts;
163	        includes leased lines, ethernets, frame relay clouds, etc.

165	   routerA host which facilitates  network-level  communication  between
166	        hosts by forwarding IP packets.

168	   path A sequence of the form < h0, l1, h1, ..., ln, hn >, where  n  >=
169	        0,  each  hi  is  a host, each li is a link between hi-1 and hi,
170	        each h1...hn-1 is a router.  A pair <li, hi> is termed a  'hop'.
171	        In  an  appropriate  operational  configuration,  the  links and
172	        routers in the path facilitate  network-layer  communication  of
173	        packets  from h0 to hn.  Note that path is a unidirectional con-
174	        cept.

176	ID                Framework for IP Performance Metrics     February 1998

178	   subpath
179	        Given a path, a subpath is any subsequence  of  the  given  path
180	        which  is itself a path.  (Thus, the first and last element of a
181	        subpath is a host.)

183	   cloudAn undirected (possibly cyclic) graph whose vertices are routers
184	        and  whose  edges are links that connect pairs of routers.  For-
185	        mally, ethernets, frame relay clouds, and other links that  con-
186	        nect  more  than  two  routers  are  modelled as fully-connected
187	        meshes of graph edges.  Note that to connect to a cloud means to
188	        connect  to  a router of the cloud over a link; this link is not
189	        itself part of the cloud.

191	   exchange
192	        A special case of a link, an exchange directly connects either a
193	        host to a cloud and/or one cloud to another cloud.

195	   cloud subpath
196	        A subpath of a given path, all of whose hosts are routers  of  a
197	        given cloud.

199	   path digest
200	        A sequence of the form < h0, e1, C1, ..., en, hn >, where  n  >=
201	        0,  h0 and hn are hosts, each e1 ... en is an exchange, and each
202	        C1 ... Cn-1 is a cloud subpath.

204	5. Fundamental Concepts

206	5.1. Metrics

208	   In the operational Internet, there are several quantities related  to
209	   the  performance  and  reliability  of the Internet that we'd like to
210	   know the value of.  When such a quantity is carefully  specified,  we
211	   term  the  quantity  a  metric.   We  anticipate  that  there will be
212	   separate RFCs for each metric (or for each closely related  group  of
213	   metrics).

215	   In some cases, there might be no obvious means to effectively measure
216	   the metric; this is allowed, and even understood to be very useful in
217	   some cases.  It is required, however, that the specification  of  the
218	   metric  be  as  clear as possible about what quantity is being speci-
219	   fied.   Thus,  difficulty  in  practical  measurement  is   sometimes
220	   allowed, but ambiguity in meaning is not.

222	   Each metric will be defined in terms of standard  units  of  measure-
223	   ment.   The  international  metric  system  will  be  used,  with the

225	ID                Framework for IP Performance Metrics     February 1998

227	   following points specifically noted:
228	 +    When a unit is expressed in simple meters (for distance/length) or
229	      seconds   (for  duration),  appropriate  related  units  based  on
230	      thousands or  thousandths  of  acceptable  units  are  acceptable.
231	      Thus,  distances expressed in kilometers (km), durations expressed
232	      in milliseconds (ms), or microseconds (us) are  allowed,  but  not
233	      centimeters  (because  the  prefix is not in terms of thousands or
234	      thousandths).
235	 +    When a unit is expressed in a combination  of  units,  appropriate
236	      related  units  based  on  thousands  or thousandths of acceptable
237	      units are acceptable, but all such thousands/thousandths  must  be
238	      grouped  at the beginning.  Thus, kilo-meters per second (km/s) is
239	      allowed, but meters per millisecond is not.
240	 +    The unit of information is the bit.
241	 +    When metric prefixes are  used  with  bits  or  with  combinations
242	      including  bits,  those  prefixes  will  have their metric meaning
243	      (related to decimal 1000), and not the meaning  conventional  with
244	      computer  storage  (related  to  decimal  1024).   In any RFC that
245	      defines a metric whose units include bits, this convention will be
246	      followed and will be repeated to ensure clarity for the reader.
247	 +    When a time is given, it will be expressed in UTC.
248	   Note that these points apply to the specifications  for  metrics  and
249	   not,  for example, to packet formats where octets will likely be used
250	   in preference/addition to bits.

252	   Finally, we note that some metrics may be defined purely in terms  of
253	   other metrics; such metrics are call 'derived metrics'.

255	5.2. Measurement Methodology

257	   For a given set of well-defined metrics, a number of  distinct  meas-
258	   urement methodologies may exist.  A partial list includes:
259	 +    Direct measurement of a performance  metric  using  injected  test
260	      traffic.   Example:  measurement  of the round-trip delay of an IP
261	      packet of a given size over a given route at a given time.
262	 +    Projection of a metric from  lower-level  measurements.   Example:
263	      given accurate measurements of propagation delay and bandwidth for
264	      each step along a path, projection of the complete delay  for  the
265	      path for an IP packet of a given size.
266	 +    Estimation of a consituent metric from a set  of  more  aggregated
267	      measurements.  Example: given accurate measurements of delay for a
268	      given one-hop path for IP packets of different  sizes,  estimation
269	      of propagation delay for the link of that one-hop path.

271	ID                Framework for IP Performance Metrics     February 1998

273	 +    Estimation of a given metric at one time from  a  set  of  related
274	      metrics at other times.  Example: given an accurate measurement of
275	      flow capacity at a past time, together  with  a  set  of  accurate
276	      delay  measurements  for  that past time and the current time, and
277	      given a model of flow dynamics, estimate the  flow  capacity  that
278	      would be observed at the current time.
279	   This list is by no means exhaustive.  The purpose is to point out the
280	   variety of measurement techniques.

282	   When a given metric is specified, a given measurement approach  might
283	   be noted and discussed.  That approach, however, is not formally part
284	   of the specification.

286	   A methodology for a metric  should  have  the  property  that  it  is
287	   repeatable: if the methodology is used multiple times under identical
288	   conditions, it should result in consistent measurements.

290	   Backing off a little from the word 'identical' in the previous  para-
291	   graph, we could more accurately use the word 'continuity' to describe
292	   a property of a given methodology: a methodology for a  given  metric
293	   exhibits  continuity  if,  for  small  variations  in  conditions, it
294	   results in small variations in the resulting measurements.   Slightly
295	   more  precisely,  for every positive epsilon, there exists a positive
296	   delta, such that if two sets of conditions are within delta  of  each
297	   other, then the resulting measurements will be within epsilon of each
298	   other.  At this point, this should be taken as  a  heuristic  driving
299	   our  intuition about one kind of robustness property rather than as a
300	   precise notion.

302	   A metric that has at least one methodology that  exhibits  continuity
303	   is said itself to exhibit continuity.

305	   Note that some metrics, such as hop-count along a path, are  integer-
306	   valued  and  therefore  cannot  exhibit continuity in quite the sense
307	   given above.

309	   Note further that, in practice, it may not be practical to  know  (or
310	   be  able  to  quantify) the conditions relevant to a measurement at a
311	   given time.  For example, since the instantaneous load (in packets to
312	   be  served)  at  a given router in a high-speed wide-area network can
313	   vary widely over relatively brief periods and will be very  hard  for
314	   an  external  observer  to  quantify,  various  statistics of a given
315	   metric may be more repeatable, or may better exhibit continuity.   In
316	   that  case  those  particular statistics should be specified when the
317	   metric is specified.

319	   Finally, some measurement methodologies may be 'conservative' in  the
320	   sense  that  the act of measurement does not modify, or only slightly

322	ID                Framework for IP Performance Metrics     February 1998

324	   modifies,  the  value  of  the  performance  metric  the  methodology
325	   attempts to measure.  {Comment: for example, in a wide-are high-speed
326	   network under modest load, a test using several small 'ping'  packets
327	   to  measure  delay  would  likely not interfere (much) with the delay
328	   properties of that network as observed by others.  The  corresponding
329	   statement  about  tests  using  a large flow to measure flow capacity
330	   would likely fail.}

332	5.3. Measurements, Uncertainties, and Errors

334	   Even the very best measurement methodologies for the very  most  well
335	   behaved metrics will exhibit errors.  Those who develop such measure-
336	   ment methodologies, however, should strive to:
337	 +    minimize their uncertainties/errors,
338	 +    understand and document the sources of uncertainty/error, and
339	 +    quantify the amounts of uncertainty/error.

341	   For example, when developing a method for measuring delay, understand
342	   how  any errors in your clocks introduce errors into your delay meas-
343	   urement, and quantify this effect as well as you can.  In some cases,
344	   this  will  result  in a requirement that a clock be at least up to a
345	   certain quality if it is to be used to make a certain measurement.

347	   As a second example, consider the timing  error  due  to  measurement
348	   overheads  within  the computer making the measurement, as opposed to
349	   delays due to the Internet component being measured.  The former is a
350	   measurement  error, while the latter reflects the metric of interest.
351	   Note that one technique that can help avoid this overhead is the  use
352	   of  a  packet  filter/sniffer,  running  on  a separate computer that
353	   records network packets and timestamps them accurately (see the  dis-
354	   cussion  of  'wire  time'  below).   The  resulting trace can then be
355	   analysed to assess the test traffic, minimising the effect  of  meas-
356	   urement  host  delays,  or  at  least  allowing  those  delays  to be
357	   accounted for.  We note that this technique may prove beneficial even
358	   if  the  packet filter/sniffer runs on the same machine, because such
359	   measurements generally provide 'kernel-level' timestamping as opposed
360	   to less-accurate 'application-level' timestamping.

362	   Finally, we note that derived metrics (defined above) or metrics that
363	   exhibit spatial or temporal composition (defined below) offer partic-
364	   ular occasion for the analysis of measurement  uncertainties,  namely
365	   how  the uncertainties propagate (conceptually) due to the derivation
366	   or composition.

368	ID                Framework for IP Performance Metrics     February 1998

370	6. Metrics and the Analytical Framework

372	   As the Internet has evolved from the early  packet-switching  studies
373	   of the 1960s, the Internet engineering community has evolved a common
374	   analytical framework of concepts.  This analytical framework,  or  A-
375	   frame,  used  by  designers  and  implementers of protocols, by those
376	   involved in measurement, and by those who study computer network per-
377	   formance using the tools of simulation and analysis, has great advan-
378	   tage to our work.  A major objective  here  is  to  generate  network
379	   characterizations  that are consistent in both analytical and practi-
380	   cal settings, since this will maximize the chances that non-empirical
381	   network  study can be better correlated with, and used to further our
382	   understanding of, real network behavior.

384	   Whenever possible, therefore, we would like to develop  and  leverage
385	   off  of  the  A-frame.   Thus,  whenever  a metric to be specified is
386	   understood to be closely related to concepts within the  A-frame,  we
387	   will attempt to specify the metric in the A-frame's terms.  In such a
388	   specification we will develop the A-frame by precisely  defining  the
389	   concepts  needed  for the metric, then leverage off of the A-frame by
390	   defining the metric in terms of those concepts.

392	   Such a metric will be called an 'analytically specified  metric'  or,
393	   more simply, an analytical metric.

395	   {Comment: Examples of such analytical metrics might include:

397	propagation time of a link
398	     The time, in seconds, required by a single bit to travel  from  the
399	     output  port  on  one Internet host across a single link to another
400	     Internet host.

402	bandwidth of a link for packets of size k
403	     The capacity, in bits/second, where  only  those  bits  of  the  IP
404	     packet are counted, for packets of size k bytes.

406	routeThe path, as defined in Section 4, from A to B at a given time.

408	hop count of a route
409	     The value 'n' of the route path.
410	     }

412	     Note that we make no a priori list of just  what  A-frame  concepts
413	     will  emerge in these specifications, but we do encourage their use
414	     and urge that they be carefully specified so that, as  our  set  of
415	     metrics  develops,  so  will  a  specified  set of A-frame concepts
416	     technically consistent with each other and consonent with the  com-
417	     mon  understanding  of  those  concepts within the general Internet

419	ID                Framework for IP Performance Metrics     February 1998

421	     community.

423	     These A-frame concepts will be intended  to  abstract  from  actual
424	     Internet components in such a way that:
425	 +    the essential function of the component is retained,
426	 +    properties of the component relevant to  the  metrics  we  aim  to
427	      create are retained,
428	 +    a subset of these component properties are potentially defined  as
429	      analytical metrics, and
430	 +    those properties of actual Internet  components  not  relevant  to
431	      defining the metrics we aim to create are dropped.

433	   For example, when considering a router in the context of packet  for-
434	   warding, we might model the router as a component that receives pack-
435	   ets on an input link, queues them on a FIFO packet  queue  of  finite
436	   size,  employs  tail-drop when the packet queue is full, and forwards
437	   them on an output link.  The transmission speed (in  bits/second)  of
438	   the  input  and output links, the latency in the router (in seconds),
439	   and the maximum size of the  packet  queue  (in  bits)  are  relevant
440	   analytical metrics.

442	   In some cases, such analytical metrics used in relation to  a  router
443	   will  be  very closely related to specific metrics of the performance
444	   of Internet paths.  For example, an obvious formula (L + P/B) involv-
445	   ing the latency in the router (L), the packet size (in bits) (P), and
446	   the transmission speed of the output link (B) might closely  approxi-
447	   mate  the  increase  in  packet delay due to the insertion of a given
448	   router along a path.

450	   We stress, however, that well-chosen and well-specified A-frame  con-
451	   cepts  and  their analytical metrics will support more general metric
452	   creation efforts in less obvious ways.

454	   {Comment: for example, when considering the flow capacity of a  path,
455	   it may be of real value to be able to model each of the routers along
456	   the path as packet forwarders as above.   Techniques  for  estimating
457	   the  flow  capacity of a path might use the maximum packet queue size
458	   as a parameter in decidedly non-obvious ways.  For  example,  as  the
459	   maximum  queue  size  increases, so will the ability of the router to
460	   continuously move traffic along an output link  despite  fluctuations
461	   in  traffic  from  an input link.  Estimating this increase, however,
462	   remains a research topic.}

464	   Note that, when we specify A-frame concepts and  analytical  metrics,
465	   we  will  inevitably  make  simplifying assumptions.  The key role of
466	   these concepts is to abstract the properties  of  the  Internet  com-
467	   ponents  relevant  to  given metrics.  Judgement is required to avoid
468	   making assumptions that bias the modeling and  metric  effort  toward

470	ID                Framework for IP Performance Metrics     February 1998

472	   one kind of design.

474	   {Comment: for example, routers might not use tail-drop,  even  though
475	   tail-drop might be easier to model analytically.}

477	   Finally, note that different elements of the A-frame might well  make
478	   different simplifying assumptions.  For example, the abstraction of a
479	   router used to further the definition of path delay might  treat  the
480	   router's  packet queue as a single FIFO queue, but the abstraction of
481	   a router used to further the definition of the handling of  an  RSVP-
482	   enabled  packet  might  treat the router's packet queue as supporting
483	   bounded delay -- a contradictory assumption.  This is not to say that
484	   we make contradictory assumptions at the same time, but that two dif-
485	   ferent parts of our work might refine the simpler base concept in two
486	   divergent ways for different purposes.

488	   {Comment: in more mathematical terms, we would say that  the  A-frame
489	   taken  as  a  whole need not be consistent; but the set of particular
490	   A-frame elements used to define a particular metric must be.}

492	7. Empirically Specified Metrics

494	   There are useful performance and reliability metrics that do not  fit
495	   so  neatly  into  the  A-frame, usually because the A-frame lacks the
496	   detail or power for dealing with them.  For example, "the  best  flow
497	   capacity  achievable  along  a  path using an RFC-2001-compliant TCP"
498	   would be good to be able to measure, but we have no analytical frame-
499	   work of sufficient richness to allow us to cast that flow capacity as
500	   an analytical metric.

502	   These notions can still be well specified  by  instead  describing  a
503	   reference methodology for measuring them.

505	   Such a metric will be called an 'empirically  specified  metric',  or
506	   more simply, an empirical metric.

508	   Such empirical metrics should have three properties:
509	 +    we should have a clear definition for each in  terms  of  Internet
510	      components,
511	 +    we should have at least one effective means to measure them, and
512	 +    to the extent possible, we should have an (necessarily incomplete)
513	      understanding of the metric in terms of the A-frame so that we can
514	      use our measurements to reason about the performance and reliabil-
515	      ity  of  A-frame  components  and  of aggregations of A-frame com-
516	      ponents.

518	ID                Framework for IP Performance Metrics     February 1998

520	8. Two Forms of Composition

522	8.1. Spatial Composition of Metrics

524	   In some cases, it may be realistic and useful to  define  metrics  in
525	   such a fashion that they exhibit spatial composition.

527	   By spatial  composition,  we  mean  a  characteristic  of  some  path
528	   metrics, in which the metric as applied to a (complete) path can also
529	   be defined for various subpaths, and in which the appropriate A-frame
530	   concepts  for  the  metric  suggest  useful relationships between the
531	   metric applied to these  various  subpaths  (including  the  complete
532	   path,  the  various  cloud  subpaths of a given path digest, and even
533	   single routers along the path).  The effectiveness of spatial  compo-
534	   sition depends:
535	 +    on the usefulness in analysis of these relationships as applied to
536	      the relevant A-frame components, and
537	 +    on the practical use of the corresponding relationships as applied
538	      to metrics and to measurement methodologies.

540	   {Comment: for example, consider some metric for delay of  a  100-byte
541	   packet  across  a path P, and consider further a path digest <h0, e1,
542	   C1, ..., en, hn> of P.  The definition of such a metric might include
543	   a  conjecture  that  the delay across P is very nearly the sum of the
544	   corresponding metric across the exhanges (ei) and clouds (Ci) of  the
545	   given  path  digest.   The definition would further include a note on
546	   how a corresponding relation applies to relevant A-frame  components,
547	   both for the path P and for the exchanges and clouds of the path dig-
548	   est.}

550	   When the definition of a metric includes a conjecture that the metric
551	   across  the  path is related to the metric across the subpaths of the
552	   path, that conjecture constitutes a claim that  the  metric  exhibits
553	   spatial composition.  The definition should then include:
554	 +    the specific conjecture applied to the metric,
555	 +    a justification of the practical utility  of  the  composition  in
556	      terms of making accurate measurements of the metric on the path,
557	 +    a justification of the usefulness of the composition in  terms  of
558	      making analysis of the path using A-frame concepts more effective,
559	      and
560	 +    an analysis of how the conjecture could be incorrect.

562	ID                Framework for IP Performance Metrics     February 1998

564	8.2. Temporal Composition of Formal Models and Empirical Metrics

566	   In some cases, it may be realistic and useful to  define  metrics  in
567	   such a fashion that they exhibit temporal composition.

569	   By temporal composition,  we  mean  a  characteristic  of  some  path
570	   metric, in which the metric as applied to a path at a given time T is
571	   also defined for various times t0 < t1 < ... < tn < T, and  in  which
572	   the appropriate A-frame concepts for the metric suggests useful rela-
573	   tionships between the metric applied at times t0,  ...,  tn  and  the
574	   metric  applied at time T.  The effectiveness of temporal composition
575	   depends:
576	 +    on the usefulness in analysis of these relationships as applied to
577	      the relevant A-frame components, and
578	 +    on the practical use of the corresponding relationships as applied
579	      to metrics and to measurement methodologies.

581	   {Comment: for example, consider a  metric for the expected flow capa-
582	   city  across  a  path P during the five-minute period surrounding the
583	   time T, and suppose further that we have the corresponding values for
584	   each  of  the  four  previous five-minute periods t0, t1, t2, and t3.
585	   The definition of such a metric might include a conjecture  that  the
586	   flow  capacity  at  time  T  can  be estimated from a certain kind of
587	   extrapolation from the values of t0, ..., t3.  The  definition  would
588	   further  include  a  note  on how a corresponding relation applies to
589	   relevant A-frame components.

591	   Note: any (spatial or temporal) compositions involving flow  capacity
592	   are likely to be subtle, and temporal compositions are generally more
593	   subtle than spatial compositions, so  the  reader  should  understand
594	   that the foregoing example is intentionally naive.}

596	   When the definition of a metric includes a conjecture that the metric
597	   across the path at a given time T is related to the metric across the
598	   path for a set of other times, that conjecture  constitutes  a  claim
599	   that the metric exhibits temporal composition.  The definition should
600	   then include:
601	 +    the specific conjecture applied to the metric,
602	 +    a justification of the practical utility  of  the  composition  in
603	      terms  of  making accurate measurements of the metric on the path,
604	      and
605	 +    a justification of the usefulness of the composition in  terms  of
606	      making analysis of the path using A-frame concepts more effective.

608	ID                Framework for IP Performance Metrics     February 1998

610	9. Issues related to Time

612	9.1. Clock Issues

614	   Measurements of time lie at  the  heart  of  many  Internet  metrics.
615	   Because  of this, it will often be crucial when designing a methodol-
616	   ogy for measuring a metric  to  understand  the  different  types  of
617	   errors  and  uncertainties  introduced  by imperfect clocks.  In this
618	   section we define terminology for discussing the  characteristics  of
619	   clocks  and  touch  upon  related measurement issues which need to be
620	   addressed by any sound methodology.

622	   The Network Time Protocol (NTP; RFC 1305) defines a nomenclature  for
623	   discussing  clock  characteristics,  which  we  will  also  use  when
624	   appropriate [Mi92].  The main goal of  NTP  is  to  provide  accurate
625	   timekeeping  over  fairly  long time scales, such as minutes to days,
626	   while for measurement  purposes  often  what  is  more  important  is
627	   short-term accuracy, between the beginning of the measurement and the
628	   end, or over the course of gathering a body of measurements  (a  sam-
629	   ple).   This difference in goals sometimes leads to different defini-
630	   tions of terminology as well, as discussed below.

632	   To begin, we define a clock's "offset" at a particular moment as  the
633	   difference between the time reported by the clock and the "true" time
634	   as defined by UTC.  If the clock reports a time Tc and the true  time
635	   is Tt, then the clock's offset is Tc - Tt.

637	   We will refer to a clock as "accurate" at a particular moment if  the
638	   clock's  offset  is  zero, and more generally a clock's "accuracy" is
639	   how close the absolute value of the offset  is  to  zero.   For  NTP,
640	   accuracy  also  includes  a notion of the frequency of the clock; for
641	   our purposes, we instead incorporate this notion into that of "skew",
642	   because we define accuracy in terms of a single moment in time rather
643	   than over an interval of time.

645	   A clock's "skew" at a particular moment is the  frequency  difference
646	   (first  derivative  of  its offset with respect to true time) between
647	   the clock and true time.

649	   As noted in RFC 1305, real clocks exhibit  some  variation  in  skew.
650	   That  is, the second derivative of the clock's offset with respect to
651	   true time is generally non-zero.  In keeping with RFC 1305, we define
652	   this quantity as the clock's "drift".

654	   A clock's "resolution" is the smallest unit by which the clock's time
655	   is  updated.   It  gives  a  lower  bound on the clock's uncertainty.
656	   (Note that clocks can have very fine resolutions and  yet  be  wildly

658	ID                Framework for IP Performance Metrics     February 1998

660	   inaccurate.)   Resolution  is  defined in terms of seconds.  However,
661	   resolution is relative to the clock's reported time and not  to  true
662	   time,  so for example a resolution of 10 ms only means that the clock
663	   updates its notion of time in 0.01 second increments, not  that  this
664	   is the true amount of time between updates.

666	   {Comment: Systems differ on how an application interface to the clock
667	   reports  the  time on subsequent calls during which the clock has not
668	   advanced.  Some systems simply return  the  same  unchanged  time  as
669	   given  for  previous  calls.  Others may add a small increment to the
670	   reported time to maintain monotonic increasing timestamps.  For  sys-
671	   tems  that do the latter, we do *not* consider these small increments
672	   when defining the clock's resolution.  They are instead an impediment
673	   to assessing the clock's resolution, since a natural method for doing
674	   so is to repeatedly query the clock to determine  the  smallest  non-
675	   zero difference in reported times.}

677	   It is expected that a clock's resolution  changes  only  rarely  (for
678	   example, due to a hardware upgrade).

680	   There are a number of interesting  metrics  for  which  some  natural
681	   measurement  methodologies  involve  comparing  times reported by two
682	   different clocks.  An example is one-way packet delay [AK96].   Here,
683	   the time required for a packet to travel through the network is meas-
684	   ured by comparing the time reported by a clock  at  one  end  of  the
685	   packet's  path,  corresponding  to  when the packet first entered the
686	   network, with the time reported by a clock at the other  end  of  the
687	   path,  corresponding  to when the packet finished traversing the net-
688	   work.

690	   We are thus also interested in terminology  for  describing  how  two
691	   clocks  C1  and  C2 compare.  To do so, we introduce terms related to
692	   those above in which the notion of "true time"  is  replaced  by  the
693	   time  as  reported by clock C1.  For example, clock C2's offset rela-
694	   tive to C1 at a particular moment is Tc2  -  Tc1,  the  instantaneous
695	   difference  in  time  reported by C2 and C1.  To disambiguate between
696	   the use of the terms to compare two clocks  versus  the  use  of  the
697	   terms  to  compare  to  true time, we will in the former case use the
698	   phrase "relative".  So the offset defined earlier in  this  paragraph
699	   is the "relative offset" between C2 and C1.

701	   When comparing clocks, the analog of "resolution"  is  not  "relative
702	   resolution",  but instead "joint resolution", which is the sum of the
703	   resolutions of C1 and C2.  The joint resolution then indicates a con-
704	   servative  lower bound on the accuracy of any time intervals computed
705	   by subtracting timestamps generated by one clock from those generated
706	   by the other.

708	ID                Framework for IP Performance Metrics     February 1998

710	   If two clocks are "accurate" with respect to one another (their rela-
711	   tive  offset  is  zero), we will refer to the pair of clocks as "syn-
712	   chronized".  Note that clocks can be highly  synchronized  yet  arbi-
713	   trarily  inaccurate  in  terms of how well they tell true time.  This
714	   point is important because for many Internet measurements, synchroni-
715	   zation  between two clocks is more important than the accuracy of the
716	   clocks.  The is somewhat true of skew, too: as long as  the  absolute
717	   skew  is not too great, then minimal relative skew is more important,
718	   as it can induce systematic trends in packet transit  times  measured
719	   by comparing timestamps produced by the two clocks.

721	   These distinctions arise because for  Internet  measurement  what  is
722	   often most important are differences in time as computed by comparing
723	   the output of two clocks.  The process of  computing  the  difference
724	   removes  any  error  due  to  clock inaccuracies with respect to true
725	   time; but it is crucial that the  differences  themselves  accurately
726	   reflect differences in true time.

728	   Measurement methodologies will often begin with the step of  assuring
729	   that  two  clocks  are  synchronized and have minimal skew and drift.
730	   {Comment: An effective way to assure these conditions (and also clock
731	   accuracy) is by using clocks that derive their notion of time from an
732	   external source, rather than only the host computer's clock.   (These
733	   latter  are  often subject to large errors.) It is further preferable
734	   that the clocks directly derive their time,  for  example  by  having
735	   immediate access to a GPS (Global Positioning System) unit.}

737	   Two important concerns arise if the clocks  indirectly  derive  their
738	   time using a network time synchronization protocol such as NTP:
739	 +    First, NTP's accuracy depends in part on the properties  (particu-
740	      larly  delay)  of  the  Internet  paths used by the NTP peers, and
741	      these might be exactly the properties that we wish to measure,  so
742	      it would be unsound to use NTP to calibrate such measurements.
743	 +    Second, NTP focuses on clock  accuracy,  which  can  come  at  the
744	      expense  of  short-term clock skew and drift.  For example, when a
745	      host's clock is indirectly synchronized to a time source,  if  the
746	      synchronization  intervals  occur infrequently, then the host will
747	      sometimes be faced with the problem of how to adjust its  current,
748	      incorrect  time,  Ti, with a considerably different, more accurate
749	      time it has just learned, Ta.  Two general ways in which  this  is
750	      done  are  to either immediately set the current time to Ta, or to
751	      adjust the local clock's update frequency  (hence,  its  skew)  so
752	      that  at  some  point  in the future the local time Ti' will agree
753	      with the more accurate time Ta'.  The first  mechanism  introduces
754	      discontinuities  and  can  also  violate  common  assumptions that
755	      timestamps are monotone increasing.  If the host's  clock  is  set
756	      backward  in  time, sometimes this can be easily detected.  If the
757	      clock is set forward in time, this can be harder to  detect.   The

759	ID                Framework for IP Performance Metrics     February 1998

761	      skew  induced  by  the  second  mechanism can lead to considerable
762	      inaccuracies when computing  differences  in  time,  as  discussed
763	      above.

765	   To illustrate why skew is a  crucial  concern,  consider  samples  of
766	   one-way  delays  between two Internet hosts made at one minute inter-
767	   vals.  The true transmission delay between the hosts might  plausibly
768	   be  on  the  order of 50 ms for a transcontinental path.  If the skew
769	   between the two clocks is 0.01%, that is,  1  part  in  10,000,  then
770	   after  10  minutes of observation the error introduced into the meas-
771	   urement is 60 ms.  Unless corrected, this error  is  enough  to  com-
772	   pletely  wipe out any accuracy in the transmission delay measurement.
773	   Finally, we note that assessing skew  errors  between  unsynchronized
774	   network  clocks  is an open research area.  (See [Pa97] for a discus-
775	   sion of detecting and compensating for these sorts of  errors.)  This
776	   shortcoming  makes  use  of a solid, independent clock source such as
777	   GPS especially desirable.

779	9.2. The Notion of "Wire Time"

781	   Internet measurement is often complicated  by  the  use  of  Internet
782	   hosts  themselves to perform the measurement.  These hosts can intro-
783	   duce delays, bottlenecks, and the like that are due  to  hardware  or
784	   operating  system  effects  and  have  nothing to do with the network
785	   behavior we would like to  measure.   This  problem  is  particularly
786	   acute  when  timestamping of network events occurs at the application
787	   level.

789	   In order to provide a general way of talking about these effects,  we
790	   introduce two notions of "wire time".  These notions are only defined
791	   in terms of an Internet host H observing an Internet link L at a par-
792	   ticular location:
793	 +    For a given packet P, the 'wire arrival time' of P at H  on  L  is
794	      the  first time T at which any bit of P has appeared at H's obser-
795	      vational position on L.
796	 +    For a given packet P, the 'wire exit time' of P at H on L  is  the
797	      first  time  T  at  which  all  the bits of P have appeared at H's
798	      observational position on L.
799	   Note that intrinsic to the definition is the notion of where  on  the
800	   link  we  are  observing.   This distinction is important because for
801	   large-latency links, we may obtain very different times depending  on
802	   exactly where we are observing the link.  We could allow the observa-
803	   tional position to be an arbitrary location along the link;  however,
804	   we define it to be in terms of an Internet host because we anticipate
805	   in practice that, for IPPM metrics, all  such  timing  will  be  con-
806	   strained  to  be performed by Internet hosts, rather than specialized
807	   hardware devices that might be able to monitor a  link  at  locations

809	ID                Framework for IP Performance Metrics     February 1998

811	   where  a host cannot.  This definition also takes care of the problem
812	   of links that are comprised of multiple physical  channels.   Because
813	   these  multiple channels are not visible at the IP layer, they cannot
814	   be individually observed in terms of the above definitions.

816	   It is possible, though one hopes uncommon, that a packet P might make
817	   multiple  trips  over  a particular link L, due to a forwarding loop.
818	   These trips might even overlap, depending  on  the  link  technology.
819	   Whenever  this occurs, we define a separate wire time associated with
820	   each instance of P seen at H's position on the link.  This definition
821	   is  worth  making  because  it serves as a reminder that notions like
822	   *the* unique time a  packet  passes  a  point  in  the  Internet  are
823	   inherently slippery.

825	   The term wire time has historically been used to loosely  denote  the
826	   time at which a packet appeared on a link, without exactly specifying
827	   whether this refers to the first bit, the last  bit,  or  some  other
828	   consideration.   This  informal  definition is generally already very
829	   useful, as it is usually used to make a distinction between when  the
830	   packet's  propagation delays begin and cease to be due to the network
831	   rather than the endpoint hosts.

833	   When appropriate, metrics should be defined in terms  of  wire  times
834	   rather  than  host  endpoint  times,  so that the metric's definition
835	   highlights the issue of separating delays due to the host from  those
836	   due to the network.

838	   We note that one potential difficulty when dealing  with  wire  times
839	   concerns  IP  fragments.   It may be the case that, due to fragmenta-
840	   tion, only a portion of a particular packet passes by  H's  location.
841	   Such  fragments  are  themselves  legitimate  packets  and have well-
842	   defined wire times associated with them; but  the  larger  IP  packet
843	   corresponding to their aggregate may not.

845	   We also note that these notions have not, to our knowledge, been pre-
846	   viously  defined  in exact terms for Internet traffic.  Consequently,
847	   we may find with  experience  that  these  definitions  require  some
848	   adjustment in the future.

850	   {Comment: It can sometimes be difficult to measure wire  times.   One
851	   technique  is  to  use  a packet filter to monitor traffic on a link.
852	   The architecture of these filters often attempts  to  associate  with
853	   each  packet  a  timestamp as close to the wire time as possible.  We
854	   note however that one common source of error is  to  run  the  packet
855	   filter  on  one  of  the  endpoint  hosts.  In this case, it has been
856	   observed that some packet filters receive for some packets timestamps
857	   corresponding  to when the packet was *scheduled* to be injected into
858	   the network, rather than when it actually was  *sent*  out  onto  the

860	ID                Framework for IP Performance Metrics     February 1998

862	   network  (wire  time).  There can be a substantial difference between
863	   these two times.  A technique for dealing with this problem is to run
864	   the  packet  filter  on  a  separate host that passively monitors the
865	   given link.  This can be problematic however for some link  technolo-
866	   gies.   See  [Pa97]  for  a  discussion of the sorts of errors packet
867	   filters can exhibit.  Finally, we note that packet filters will often
868	   only capture the first fragment of a fragmented IP packet, due to the
869	   use of filtering on fields in the IP and transport protocol  headers.
870	   As  we  generally  desire  our measurement methodologies to avoid the
871	   complexity of creating fragmented traffic, one strategy  for  dealing
872	   with  their  presence  as detected by a packet filter is to flag that
873	   the measured traffic has an unusual form and abandon further analysis
874	   of the packet timing.}

876	10. Singletons, Samples, and Statistics

878	   With experience we have found it useful  to  introduce  a  separation
879	   between three distinct -- yet related -- notions:
880	 +    By a 'singleton' metric, we refer to metrics that are, in a sense,
881	      atomic.   For example, a single instance of "bulk throughput capa-
882	      city" from one host to another might be  defined  as  a  singleton
883	      metric,  even though the instance involves measuring the timing of
884	      a number of Internet packets.
885	 +    By a 'sample' metric, we refer to metrics  derived  from  a  given
886	      singleton   metric  by  taking  a  number  of  distinct  instances
887	      together.  For example, we might define a sample metric of one-way
888	      delays  from  one  host  to another as an hour's worth of measure-
889	      ments, each made at Poisson intervals with a mean spacing  of  one
890	      second.
891	 +    By a 'statistical' metric, we refer  to  metrics  derived  from  a
892	      given  sample  metric  by  computing  some statistic of the values
893	      defined by the singleton metric on the sample.  For  example,  the
894	      mean  of  all  the  one-way delay values on the sample given above
895	      might be defined as a statistical metric.
896	   By applying these notions of singleton, sample, and  statistic  in  a
897	   consistent way, we will be able to reuse lessons learned about how to
898	   define samples and statistics on various metrics.  The  orthogonality
899	   among  these three notions will thus make all our work more effective
900	   and more intelligible by the community.

902	   In the remainder of this section, we will cover some topics  in  sam-
903	   pling  and  statistics that we believe will be important to a variety
904	   of metric definitions and measurement efforts.

906	ID                Framework for IP Performance Metrics     February 1998

908	10.1. Methods of Collecting Samples

910	   The main reason for collecting samples is to see what sort of  varia-
911	   tions  and  consistencies  are  present in the metric being measured.
912	   These variations might be with respect to  different  points  in  the
913	   Internet,  or different measurement times.  When assessing variations
914	   based on a sample, one generally makes an assumption that the  sample
915	   is  "unbiased",  meaning  that the process of collecting the measure-
916	   ments in the sample did not skew the sample  so  that  it  no  longer
917	   accurately reflects the metric's variations and consistencies.

919	   One  common  way  of  collecting  samples  is  to  make  measurements
920	   separated by fixed amounts of time: periodic sampling.  Periodic sam-
921	   pling is particularly attractive because of its  simplicity,  but  it
922	   suffers from two potential problems:
923	 +    If the metric being measured itself  exhibits  periodic  behavior,
924	      then  there  is  a possibility that the sampling will observe only
925	      part of the periodic behavior  if  the  periods  happen  to  agree
926	      (either  directly, or if one is a multiple of the other).  Related
927	      to this problem is the notion that periodic sampling can be easily
928	      anticipated.   Predictable sampling is susceptible to manipulation
929	      if there are mechanisms by which a  network  component's  behavior
930	      can  be  temporarily  changed such that the sampling only sees the
931	      modified behavior.
932	 +    The act of measurement can perturb what  is  being  measured  (for
933	      example,  injecting  measurement traffic into a network alters the
934	      congestion level of the network), and repeated periodic  perturba-
935	      tions  can  drive  a  network into a state of synchronization (cf.
936	      [FJ94]), greatly  magnifying  what  might  individually  be  minor
937	      effects.

939	   A more sound approach is based on "random additive sampling": samples
940	   are  separated by independent, randomly generated intervals that have
941	   a common statistical distribution G(t) [BM92].  The quality  of  this
942	   sampling depends on the distribution G(t).  For example, if G(t) gen-
943	   erates a constant value g with probability  one,  then  the  sampling
944	   reduces to periodic sampling with a period of g.

946	   Random additive sampling gains significant advantages.   In  general,
947	   it  avoids synchronization effects and yields an unbiased estimate of
948	   the property being sampled.  The only significant drawbacks  with  it
949	   are:
950	 +    it complicates frequency-domain analysis, because the  samples  do
951	      not  occur at fixed intervals such as assumed by Fourier-transform
952	      techniques; and

954	ID                Framework for IP Performance Metrics     February 1998

956	 +    unless G(t) is the exponential distribution (see below),  sampling
957	      still remains somewhat predictable, as discussed for periodic sam-
958	      pling above.

960	10.1.1. Poisson Sampling

962	   It can be proved that if G(t) is  an  exponential  distribution  with
963	   rate lambda, that is
964	       G(t) = 1 - exp(-lambda * t)
965	   then the arrival of new samples *cannot* be  predicted  (and,  again,
966	   the  sampling  is unbiased).  Furthermore, the sampling is asymptoti-
967	   cally unbiased even if the act  of  sampling  affects  the  network's
968	   state.   Such  sampling  is referred to as "Poisson sampling".  It is
969	   not prone to inducing synchronization, it can be used  to  accurately
970	   collect  measurements  of  periodic  behavior, and it is not prone to
971	   manipulation by anticipating when new samples will occur.

973	   Because of these valuable properties, we in general prefer that  sam-
974	   ples  of  Internet  measurements are gathered using Poisson sampling.
975	   {Comment: We note, however, that  there  may  be  circumstances  that
976	   favor  use of a different G(t).  For example, the exponential distri-
977	   bution is unbounded, so its use will  on  occasion  generate  lengthy
978	   spaces  between sampling times.  We might instead desire to bound the
979	   longest such interval to a maximum value dT, to speed the convergence
980	   of  the  estimation derived from the sampling.  This could be done by
981	   using
982	       G(t) = Unif(0, dT)
983	   that is, the uniform distribution between 0 and dT.   This  sampling,
984	   of course, becomes highly predictable if an interval of nearly length
985	   dT has elapsed without a sample occurring.}

987	   In its purest form, Poisson sampling is done by  generating  indepen-
988	   dent,  exponentially  distributed  intervals  and  gathering a single
989	   measurement after each interval has elapsed.  It can be shown that if
990	   starting at time T one performs Poisson sampling over an interval dT,
991	   during which a total of N measurements happen to be made, then  those
992	   measurements  will  be  uniformly  distributed  over the interval [T,
993	   T+dT].  So another way of conducting Poisson sampling is to  pick  dT
994	   and  N and generate N random sampling times uniformly over the inter-
995	   val [T, T+dT].  The two approaches are equivalent, except if N and dT
996	   are  externally  known.  In that case, the property of not being able
997	   to predict measurement times is weakened (the other properties  still
998	   hold).   The  N/dT  approach has an advantage that dealing with fixed
999	   values of N and dT can be simpler than dealing with  a  fixed  lambda
1000	   but variable numbers of measurements over variably-sized intervals.

1002	ID                Framework for IP Performance Metrics     February 1998

1004	10.1.2. Geometric Sampling

1006	   Closely related to Poisson sampling is "geometric sampling", in which
1007	   external  events  are measured with a fixed probability p.  For exam-
1008	   ple, one might capture all the packets over a link  but  only  record
1009	   the  packet  to a trace file if a randomly generated number uniformly
1010	   distributed between 0 and 1 is less than a given p.   Geometric  sam-
1011	   pling  has  the same properties of being unbiased and not predictable
1012	   in advance as Poisson sampling, so if it fits a  particular  Internet
1013	   measurement task, it too is sound.  See [CPB93] for more discussion.

1015	10.1.3. Generating Poisson Sampling Intervals

1017	   To generate Poisson sampling intervals, one first determines the rate
1018	   lambda  at  which  the singleton measurements will on average be made
1019	   (e.g., for an average sampling interval of 30 seconds, we have lambda
1020	   =  1/30,  if  the  units  of time are seconds).  One then generates a
1021	   series of exponentially-distributed (pseudo-)random numbers  E1,  E2,
1022	   ...,  En.  The first measurement is made at time E1, the next at time
1023	   E1+E2, and so on.

1025	   One  technique  for  generating  exponentially-distributed   (pseudo-
1026	   )random  numbers is based on the ability to generate U1, U2, ..., Un,
1027	   (pseudo-)random numbers that are uniformly distributed between 0  and
1028	   1.   Many  computers  provide libraries that can do this.  Given such
1029	   Ui, to generate Ei one uses:
1030	       Ei = -log(Ui) / lambda
1031	   where log(Ui) is the natural logarithm of Ui.  {Comment:  This  tech-
1032	   nique  is  an instance of the more general "inverse transform" method
1033	   for generating random numbers with a given distribution.}

1035	   Implementation details:

1037	   There are at least three different methods for approximating  Poisson
1038	   sampling, which we describe here as Methods 1 through 3.  Method 1 is
1039	   the easiest to implement and has the most error, and method 3 is  the
1040	   most  difficult  to  implement  and  has the least error (potentially
1041	   none).

1043	   Method 1 is to proceed as follows:
1044	   1.  Generate E1 and wait that long.
1045	   2.  Perform a measurement.
1046	   3.  Generate E2 and wait that long.
1047	   4.  Perform a measurement.
1048	   5.  Generate E3 and wait that long.
1049	   6.  Perform a measurement ...

1051	ID                Framework for IP Performance Metrics     February 1998

1053	   The problem with this approach is that the  "Perform  a  measurement"
1054	   steps  themselves take time, so the sampling is not done at times E1,
1055	   E1+E2, etc., but rather at E1, E1+M1+E2, etc., where Mi is the amount
1056	   of  time required for the i'th measurement.  If Mi is very small com-
1057	   pared to 1/lambda then the potential error introduced by  this  tech-
1058	   nique  is likewise small.  As Mi becomes a non-negligible fraction of
1059	   1/lambda, the potential error increases.

1061	   Method 2 attempts to correct this error by taking  into  account  the
1062	   amount  of  time  required  by  the measurements (i.e., the Mi's) and
1063	   adjusting the waiting intervals accordingly:
1064	   1.  Generate E1 and wait that long.
1065	   2.  Perform a measurement and measure M1, the time it took to do so.
1066	   3.  Generate E2 and wait for a time E2-M1.
1067	   4.  Perform a measurement and measure M2 ..

1069	   This approach works fine as long as E{i+1} >= Mi.  But if E{i+1} < Mi
1070	   then  it is impossible to wait the proper amount of time.  (Note that
1071	   this case corresponds to needing to perform two  measurements  simul-
1072	   taneously.)

1074	   Method 3 is generating a schedule of  measurement  times  E1,  E1+E2,
1075	   etc., and then sticking to it:
1076	   1.  Generate E1, E2, ..., En.
1077	   2.  Compute measurement times T1, T2, ..., Tn, as Ti = E1 + ... + Ei.
1078	   3.  Arrange that at times T1, T2, ..., Tn, a measurement is made.

1080	   By allowing simultaneous measurements, Method 3 avoids the  shortcom-
1081	   ings  of  Methods  1  and  2.  If, however, simultaneous measurements
1082	   interfere with one another, then Method 3 does not gain  any  benefit
1083	   and may actually prove worse than Methods 1 or 2.

1085	   For Internet phenomena, it is not known to what degree the  inaccura-
1086	   cies  of  these  methods  are significant.  If the Mi's are much less
1087	   than 1/lambda, then any of the three should suffice.  If the Mi's are
1088	   less  than  1/lambda  but  perhaps not greatly less, then Method 2 is
1089	   preferred to Method 1.  If simultaneous measurements do not interfere
1090	   with  one  another, then Method 3 is preferred, though it can be con-
1091	   siderably harder to implement.

1093	10.2. Self-Consistency

1095	   A fundamental requirement for a sound measurement methodology is that
1096	   measurement be made using as few unconfirmed assumptions as possible.
1097	   Experience has painfully shown how easy  it  is  to  make  an  (often
1098	   implicit)  assumption  that turns out to be incorrect.  An example is
1099	   incorporating into a measurement the reading of a clock  synchronized

1101	ID                Framework for IP Performance Metrics     February 1998

1103	   to  a highly accurate source.  It is easy to assume that the clock is
1104	   therefore accurate; but due to software bugs, a loss of power in  the
1105	   source,  or a loss of communication between the source and the clock,
1106	   the clock could actually be quite inaccurate.

1108	   This is not to argue that one must not make  *any*  assumptions  when
1109	   measuring, but rather that, to the extent which is practical, assump-
1110	   tions should be tested.  One  powerful  way  for  doing  so  involves
1111	   checking  for  self-consistency.   Such  checking applies both to the
1112	   observed value(s) of the measurement *and  the  values  used  by  the
1113	   measurement  process itself*.  A simple example of the former is that
1114	   when computing a round trip time, one should check to see  if  it  is
1115	   negative.  Since negative time intervals are non-physical, if it ever
1116	   is negative that finding immediately flags an error.  *These sorts of
1117	   errors should then be investigated!* It is crucial to determine where
1118	   the error lies, because only by doing so diligently can we  build  up
1119	   faith  in  a  methodology's  fundamental  soundness.  For example, it
1120	   could be that the round trip time  is  negative  because  during  the
1121	   measurement  the clock was set backward in the process of synchroniz-
1122	   ing it with another source.  But it could also be that  the  measure-
1123	   ment program accesses uninitialized memory in one of its computations
1124	   and, only very rarely, that  leads  to  a  bogus  computation.   This
1125	   second  error  is more serious, if the same program is used by others
1126	   to perform the same measurement, since then they too will suffer from
1127	   incorrect  results.  Furthermore, once uncovered it can be completely
1128	   fixed.

1130	   A more subtle example of  testing  for  self-consistency  comes  from
1131	   gathering  samples  of  one-way  Internet delays.  If one has a large
1132	   sample of such delays, it may well be highly telling to, for example,
1133	   fit  a line to the pairs of (time of measurement, measured delay), to
1134	   see if the resulting line has a clearly non-zero  slope.   If  so,  a
1135	   possible  interpretation  is that one of the clocks used in the meas-
1136	   urements is skewed relative to the other.  Another interpretation  is
1137	   that the slope is actually due to genuine network effects.  Determin-
1138	   ing which is indeed the case will often be highly illuminating.  (See
1139	   [Pa97] for a discussion of distinguishing between relative clock skew
1140	   and genuine network effects.) Furthermore, if making  this  check  is
1141	   part  of  the methodology, then a finding that the long-term slope is
1142	   very near zero is positive evidence that the measurements  are  prob-
1143	   ably not biased by a difference in skew.

1145	   A final example illustrates checking the measurement  process  itself
1146	   for  self-consistency.  Above we outline Poisson sampling techniques,
1147	   based on generating  exponentially-distributed  intervals.   A  sound
1148	   measurement methodology would include testing the generated intervals
1149	   to see whether they are indeed exponentially distributed (and also to
1150	   see if they suffer from correlation).  In the appendix we discuss and

1152	ID                Framework for IP Performance Metrics     February 1998

1154	   give C code for one such technique, a general-purpose,  well-regarded
1155	   goodness-of-fit test called the Anderson-Darling test.

1157	   Finally, we note that what is truly relevant for Poisson sampling  of
1158	   Internet  metrics  is  often  not when the measurements began but the
1159	   wire times corresponding to the  measurement  process.   These  could
1160	   well  be different, due to complications on the hosts used to perform
1161	   the measurement.  Thus, even  those  with  complete  faith  in  their
1162	   pseudo-random   number   generators  and  subsequent  algorithms  are
1163	   encouraged to consider how they might test the  assumptions  of  each
1164	   measurement procedure as much as possible.

1166	10.3. Defining Statistical Distributions

1168	   One way of describing a collection of measurements (a sample) is as a
1169	   statistical  distribution  --  informally, as percentiles.  There are
1170	   several slightly different ways of doing  so.   In  this  section  we
1171	   define  a  standard  definition  to give uniformity to these descrip-
1172	   tions.

1174	   The "empirical distribution function" (EDF) of a set of scalar  meas-
1175	   urements is a function F(x) which for any x gives the fractional pro-
1176	   portion of the total measurements that were <= x.  If x is less  than
1177	   the  minimum  value  observed,  then  F(x) is 0.  If it is greater or
1178	   equal to the maximum value observed, then F(x) is 1.

1180	   For example, given the 6 measurements:
1181	   -2, 7, 7, 4, 18, -5
1182	   Then F(-8) = 0, F(-5) = 1/6, F(-5.0001) = 0, F(-4.999) = 1/6, F(7)  =
1183	   5/6, F(18) = 1, F(239) = 1.

1185	   Note that we can recover the different measured values and  how  many
1186	   times  each  occurred from F(x) -- no information regarding the range
1187	   in values is lost.  Summarizing measurements using histograms, on the
1188	   other  hand,  in general loses information about the different values
1189	   observed, so the EDF is preferred.

1191	   Using either the EDF or a histogram, however, we do lose  information
1192	   regarding  the order in which the values were observed.  Whether this
1193	   loss is potentially significant will depend on the metric being meas-
1194	   ured.

1196	   We will use the term "percentile" to refer to the smallest value of x
1197	   for  which F(x) >= a given percentage.  So the 50th percentile of the
1198	   example above is 4, since F(4) = 3/6 = 50%; the  25th  percentile  is
1199	   -2,  since  F(-5) = 1/6 < 25%, and F(-2) = 2/6 >= 25%; the 100th per-
1200	   centile is 18; and the 0th percentile is -infinity, as  is  the  15th

1202	ID                Framework for IP Performance Metrics     February 1998

1204	   percentile.

1206	   Care must be taken when using  percentiles  to  summarize  a  sample,
1207	   because  they  can  lend  an unwarranted appearance of more precision
1208	   than is really available.  Any such summary must include  the  sample
1209	   size N, because any percentile difference finer than 1/N is below the
1210	   resolution of the sample.

1212	   See [DS86] for more details regarding EDF's.

1214	   We close with a note on the common (and important!) notion of median.
1215	   In  statistics,  the  median  of  a distribution is defined to be the
1216	   point X for which the probability of observing a value <= X is  equal
1217	   to  the  probability  of  observing a value > X.  When estimating the
1218	   median of a set of observations, the estimate depends on whether  the
1219	   number of observations, N, is odd or even:
1220	 +    If N is odd, then the 50th percentile as defined above is used  as
1221	      the estimated median.
1222	 +    If N is even, then the estimated median is the average of the cen-
1223	      tral  two observations; that is, if the observations are sorted in
1224	      ascending order and numbered from 1 to N, where N = 2*K, then  the
1225	      estimated  median is the average of the (K)'th and (K+1)'th obser-
1226	      vations.
1227	   Usually the term "estimated" is dropped from  the  phrase  "estimated
1228	   median" and this value is simply referred to as the "median".

1230	10.4. Testing For Goodness-of-Fit

1232	   For some forms of measurement calibration we need to test  whether  a
1233	   set  of  numbers  is  consistent with those numbers having been drawn
1234	   from a particular distribution.  An example is that to apply a  self-
1235	   consistency  check  to measurements made using a Poisson process, one
1236	   test is to see whether the spacing between the  sampling  times  does
1237	   indeed  reflect  an exponential distribution; or if the dT/N approach
1238	   discussed above was used, whether the times are uniformly distributed
1239	   across [T, dT].

1241	   {Comment: There are at least three possible sets of values  we  could
1242	   test:  the  scheduled packet transmission times, as determined by use
1243	   of a pseudo-random number generator; user-level timestamps made  just
1244	   before or after the system call for transmitting the packet; and wire
1245	   times for the packets as recorded using a packet filter.   All  three
1246	   of  these  are  potentially  informative:  failures for the scheduled
1247	   times to match an exponential distribution indicate  inaccuracies  in
1248	   the random number generation; failures for the user-level times indi-
1249	   cate inaccuracies in the timers used to  schedule  transmission;  and
1250	   failures  for  the  wire  times  indicate  inaccuracies  in  actually

1252	ID                Framework for IP Performance Metrics     February 1998

1254	   transmitting the packets, perhaps due  to  contention  for  a  shared
1255	   resource.}

1257	   There are a large number of  statistical  goodness-of-fit  techniques
1258	   for  performing  such  tests.   See [DS86] for a thorough discussion.
1259	   That reference recommends the Anderson-Darling EDF test  as  being  a
1260	   good  all-purpose  test,  as  well  as one that is especially good at
1261	   detecting deviations from a given distribution in the lower and upper
1262	   tails of the EDF.

1264	   It is important to understand  that  the  nature  of  goodness-of-fit
1265	   tests  is that one first selects a "significance level", which is the
1266	   probability that the test will erroneously declare that the EDF of  a
1267	   given  set  of  measurements fails to match a particular distribution
1268	   when in fact the measurements do indeed  reflect  that  distribution.
1269	   Unless otherwise stated, IPPM goodness-of-fit tests are done using 5%
1270	   significance.  This means that if the test is applied to 100  samples
1271	   and  5  of those samples are deemed to have failed the test, then the
1272	   samples are all consistent with the distribution  being  tested.   If
1273	   significantly  more of the samples fail the test, then the assumption
1274	   that the samples are consistent with the  distribution  being  tested
1275	   must  be  rejected.   If  significantly fewer of the samples fail the
1276	   test, then the samples have potentially been doctored too well to fit
1277	   the  distribution.   Similarly, some goodness-of-fit tests (including
1278	   Anderson-Darling) can detect whether it is likely that a given sample
1279	   was  doctored.   We also use a significance of 5% for this case; that
1280	   is, the test will report that a given honest sample is "too  good  to
1281	   be true" 5% of the time, so if the test reports this finding signifi-
1282	   cantly more often than one time out of twenty, it  is  an  indication
1283	   that something unusual is occurring.

1285	   The appendix gives sample  C  code  for  implementing  the  Anderson-
1286	   Darling test, as well as further discussing its use.

1288	   See [Pa94] for a discussion of goodness-of-fit  and  closeness-of-fit
1289	   tests in the context of network measurement.

1291	11. Avoiding Stochastic Metrics

1293	   When defining metrics applying to a path, subpath,  cloud,  or  other
1294	   network element, we in general do not define them in stochastic terms
1295	   (probabilities).  We instead prefer a deterministic definition.   So,
1296	   for  example, rather than defining a metric about a "packet loss pro-
1297	   bability between A and B", we would define a metric about  a  "packet
1298	   loss rate between A and B".  (A measurement given by the first defin-
1299	   ition might be "0.73", and by the second "73 packets out of 100".)

1301	ID                Framework for IP Performance Metrics     February 1998

1303	   We emphasize that the above distinction concerns the *definitions* of
1304	   *metrics*.  It is not intended to apply to what sort of techniques we
1305	   might use to analyze the results of measurements.

1307	   The reason for this distinction is as follows.  When definitions  are
1308	   made in terms of probabilities, there are often hidden assumptions in
1309	   the definition about a stochastic model of the behavior  being  meas-
1310	   ured.  The fundamental goal with avoiding probabilities in our metric
1311	   definitions is to avoid  biasing  our  definitions  by  these  hidden
1312	   assumptions.

1314	   For example, an easy hidden assumption to make is that packet loss in
1315	   a  network  component  due  to queueing overflows can be described as
1316	   something that happens to any given packet with a  particular  proba-
1317	   bility.   In  today's  Internet, however, queueing drops are actually
1318	   usually *deterministic*, and assuming that they should  be  described
1319	   probabilistically  can  obscure crucial correlations between queueing
1320	   drops among a set of packets.  So it's better to explicitly note sto-
1321	   chastic assumptions, rather than have them sneak into our definitions
1322	   implicitly.

1324	   This does *not* mean that we abandon stochastic  models  for  *under-
1325	   standing*  network  performance!  It only means that when defining IP
1326	   metrics we avoid terms such as "probability" for terms like  "propor-
1327	   tion"  or "rate".  We will still use, for example, random sampling in
1328	   order to estimate probabilities used by stochastic models related  to
1329	   the  IP metrics.  We also do not rule out the possibility of stochas-
1330	   tic metrics when they are truly appropriate (for example, perhaps  to
1331	   model transmission errors caused by certain types of line noise).

1333	12. Packets of Type P

1335	   A fundamental property of many Internet metrics is that the value  of
1336	   the metric depends on the type of IP packet(s) used to make the meas-
1337	   urement.  Consider an IP-connectivity metric: one  obtains  different
1338	   results  depending  on  whether one is interested in connectivity for
1339	   packets destined for well-known TCP ports or unreserved UDP ports, or
1340	   those with invalid IP checksums, or those with TTL's of 16, for exam-
1341	   ple.   In  some  circumstances  these  distinctions  will  be  highly
1342	   interesting  (for  example,  in  the  presence  of firewalls, or RSVP
1343	   reservations).

1345	   Because of this distinction, we introduce the  generic  notion  of  a
1346	   "packet  of  type  P",  where  in  some contexts P will be explicitly
1347	   defined (i.e., exactly  what  type  of  packet  we  mean),  partially
1348	   defined  (e.g., "with a payload of B octets"), or left generic.  Thus
1349	   we may talk about generic  IP-type-P-connectivity  or  more  specific

1351	ID                Framework for IP Performance Metrics     February 1998

1353	   IP-port-HTTP-connectivity.   Some  metrics  and  methodologies may be
1354	   fruitfully defined using generic type P definitions  which  are  then
1355	   made specific when performing actual measurements.

1357	   Whenever a metric's value depends on the type of the packets involved
1358	   in  the metric, the metric's name will include either a specific type
1359	   or a phrase such as "type-P".   Thus  we  will  not  define  an  "IP-
1360	   connectivity"  metric  but instead an "IP-type-P-connectivity" metric
1361	   and/or perhaps an "IP-port-HTTP-connectivity"  metric.   This  naming
1362	   convention serves as an important reminder that one must be conscious
1363	   of the exact type of traffic being measured.

1365	   A closely related note: it would be very useful to know  if  a  given
1366	   Internet  component  treats  equally  a class C of different types of
1367	   packets.  If so, then any one of those types of packets can  be  used
1368	   for subsequent measurement of the component.  This suggests we devise
1369	   a metric or suite of metrics that attempt to determine C.

1371	13. Internet Addresses vs. Hosts

1373	   When considering a metric for some path through the Internet,  it  is
1374	   often  natural  to think about it as being for the path from Internet
1375	   host H1 to host H2.  A definition in  these  terms,  though,  can  be
1376	   ambiguous,  because  Internet  hosts can be attached to more than one
1377	   network.  In this case, the result of the metric will depend on which
1378	   of these networks is actually used.

1380	   Because of this ambiguitiy, usually such definitions  should  instead
1381	   be defined in terms of Internet IP addresses.  For the common case of
1382	   a unidirectional path through the Internet,  we  will  use  the  term
1383	   "Src"  to  denote  the  IP  address of the beginning of the path, and
1384	   "Dst" to denote the IP address of the end.

1386	14. Standard-Formed Packets

1388	   Unless otherwise stated, all metric definitions that concern IP pack-
1389	   ets  include  an  implicit  assumption  that  the packet is *standard
1390	   formed*.  A packet is standard formed if it meets all of the  follow-
1391	   ing criteria:
1392	 +    Its length as given in the IP header corresponds to  the  size  of
1393	      the IP header plus the size of the payload.

1395	ID                Framework for IP Performance Metrics     February 1998

1397	 +    It includes a valid IP header: the version field is 4  (later,  we
1398	      will  expand  this  to  include 6); the header length is >= 5; the
1399	      checksum is correct.
1400	 +    It is not an IP fragment.
1401	 +    The source and destination addresses correspond to  the  hosts  in
1402	      question.
1403	 +    Either the packet possesses sufficient  TTL  to  travel  from  the
1404	      source to the destination if the TTL is decremented by one at each
1405	      hop, or it possesses the maximum TTL of 255.
1406	 +    It does not contain IP options unless explicitly noted.
1407	 +    If a transport header is present, it too contains a valid checksum
1408	      and other valid fields.
1409	   We further require that if a packet is described as having a  "length
1410	   of B octets", then 0 <= B <= 65535; and if B is the payload length in
1411	   octets, then B <= (65535-IP header size in octets).

1413	   So, for example, one might imagine defining an IP connectivity metric
1414	   as  "IP-type-P-connectivity  for  standard-formed packets with the IP
1415	   TOS field set to 0",  or,  more  succinctly,  "IP-type-P-connectivity
1416	   with  the  IP  TOS  field set to 0", since standard-formed is already
1417	   implied by convention.

1419	   A particular type of standard-formed packet often useful to  consider
1420	   is  the  "minimal  IP packet from A to B" - this is an IP packet with
1421	   the following properties:
1422	 +    It is standard-formed.
1423	 +    Its data payload is 0 octets.
1424	 +    It contains no options.
1425	   (Note that we do not define its protocol field, as  different  values
1426	   may lead to different treatment by the network.)

1428	   When defining IP metrics we keep in mind that no  packet  smaller  or
1429	   simpler  than  this  can be transmitted over a correctly operating IP
1430	   network.

1432	15. Acknowledgements

1434	   The comments of Brian Carpenter,  Bill  Cerveny,  Padma  Krishnaswamy
1435	   Jeff Sedayao and Howard Stanislevic are appreciated.

1437	ID                Framework for IP Performance Metrics     February 1998

1439	16. Security Considerations

1441	   This document concerns definitions and concepts related  to  Internet
1442	   measurement.   We  discuss  measurement procedures only in high-level
1443	   terms, regarding principles that lend themselves  to  sound  measure-
1444	   ment.   As  such,  the topics discussed do not affect the security of
1445	   the Internet or of applications which run on it.

1447	   That said, it should be recognized that conducting Internet  measure-
1448	   ments  can  raise  both  security and privacy concerns.  Active tech-
1449	   niques, in which traffic is injected into the network, can be  abused
1450	   for  denial-of-service  attacks  disguised  as legitimate measurement
1451	   activity.  Passive techniques, in which existing traffic is  recorded
1452	   and  analyzed,  can  expose the contents of Internet traffic to unin-
1453	   tended recipients.  Consequently, the definition of each  metric  and
1454	   methodology  must include a corresponding discussion of security con-
1455	   siderations.

1457	17. Appendix

1459	   Below we give routines written  in  C  for  computing  the  Anderson-
1460	   Darling  test  statistic (A2) for determining whether a set of values
1461	   is consistent with a given statistical distribution.  Externally, the
1462	   two main routines of interest are:
1463	       double exp_A2_known_mean(double x[], int n, double mean)
1464	       double unif_A2_known_range(double x[], int n,
1465	                                  double min_val, double max_val)
1466	   Both take as their first argument, x, the array of  n  values  to  be
1467	   tested.   (Upon return, the elements of x are sorted.)  The remaining
1468	   parameters characterize the distribution to be used: either the  mean
1469	   (1/lambda),  for  an exponential distribution, or the lower and upper
1470	   bounds, for a uniform distribution.  The names of the routines stress
1471	   that  these values must be known in advance, and *not* estimated from
1472	   the data (for example, by computing its sample mean).  Estimating the
1473	   parameters from the data *changes* the significance level of the test
1474	   statistic.  While [DS86] gives alternate significance tables for some
1475	   instances  in  which  the parameters are estimated from the data, for
1476	   our purposes we expect that we should indeed know the  parameters  in
1477	   advance,  since  what we will be testing are generally values such as
1478	   packet sending times that we wish to verify follow a known  distribu-
1479	   tion.

1481	   Both routines return a significance level, as described earlier. This
1482	   is  a  value  between 0 and 1.  The correct use of the routines is to
1483	   pick in advance the threshold for the  significance  level  to  test;
1484	   generally,  this will be 0.05, corresponding to 5%, as also described
1485	   above.  Subsequently, if the routines return a  value  strictly  less

1487	ID                Framework for IP Performance Metrics     February 1998

1489	   than this threshold, then the data are deemed to be inconsistent with
1490	   the presumed distribution, *subject to an error corresponding to  the
1491	   significance  level*.  That is, for a significance level of 5%, 5% of
1492	   the time data that is indeed drawn  from  the  presumed  distribution
1493	   will be erroneously deemed inconsistent.

1495	   Thus, it is important to bear in mind that if these routines are used
1496	   frequently,  then one will indeed encounter occasional failures, even
1497	   if the data is unblemished.

1499	   Another important point concerning significance levels is that it  is
1500	   unsound  to  compare  them in order to determine which of two sets of
1501	   values is a "better" fit to a presumed  distribution.   Such  testing
1502	   should  instead  be done using "closeness-of-fit metrics" such as the
1503	   lambda^2 metric described in [Pa94].

1505	   While the routines provided are for exponential and uniform distribu-
1506	   tions  with  known  parameters,  it  is generally straight-forward to
1507	   write comparable routines for any distribution with known parameters.
1508	   The  heart  of  the A2 tests lies in a statistic computed for testing
1509	   whether a set of values is consistent  with  a  uniform  distribution
1510	   between  0  and  1,  which  we  term  Unif(0, 1).  If we wish to test
1511	   whether a set of values, X, is consistent with a  given  distribution
1512	   G(x), we first compute
1513	       Y = G_inverse(X)
1514	   If X is indeed distributed according to G(x), then Y will be  distri-
1515	   buted  according  to Unif(0, 1); so by testing Y for consistency with
1516	   Unif(0, 1), we also test X for consistency with G(x).

1518	   We note, however, that the process of computing Y above  might  yield
1519	   values  of  Y outside the range (0..1).  Such values should not occur
1520	   if X is indeed distributed according to G(x), but easily can occur if
1521	   it  is  not.  In the latter case, we need to avoid computing the cen-
1522	   tral A2 statistic, since floating-point exceptions may occur  if  any
1523	   of  the  values  lie outside (0..1).  Accordingly, the routines check
1524	   for this possiblity, and if encountered, return a raw A2 statistic of
1525	   -1.  The routine that converts the raw A2 statistic to a significance
1526	   level likewise propagates this value, returning a significance  level
1527	   of -1.  So, any use of these routines must be prepared for a possible
1528	   negative significance level.

1530	   The last important point regarding use of A2  statistic  concerns  n,
1531	   the  number  of  values  being tested.  If n < 5 then the test is not
1532	   meaningful, and in this case a significance level of -1 is returned.

1534	   On the other hand, for "real"  data  the  test  *gains*  power  as  n
1535	   becomes  larger.   It  is well known in the statistics community that
1536	   real data almost never exactly matches  a  theoretical  distribution,

1538	ID                Framework for IP Performance Metrics     February 1998

1540	   even in cases such as rolling dice a great many times (see [Pa94] for
1541	   a brief discussion and references).  The A2 test is sensitive  enough
1542	   that,  for sufficiently large sets of real data, the test will almost
1543	   always fail, because it will manage to detect slight imperfections in
1544	   the fit of the data to the distribution.

1546	   For example, we have found that  when  testing  8,192  measured  wire
1547	   times  for packets sent at Poisson intervals, the measurements almost
1548	   always fail the A2 test.  On the other hand, testing 128 measurements
1549	   failed  at  5%  significance  only about 5% of the time, as expected.
1550	   Thus, in general, when the test fails, care must be taken  to  under-
1551	   stand why it failed.

1553	   The remainder of this appendix gives C code  for  the  routines  men-
1554	   tioned above.

1556	   /* Routines for computing the Anderson-Darling A2 test statistic.
1557	    *
1558	    * Implememented based on the description in "Goodness-of-Fit
1559	    * Techniques," R. D'Agostino and M. Stephens, editors,
1560	    * Marcel Dekker, Inc., 1986.
1561	    */

1563	   #include <stdio.h>
1564	   #include <stdlib.h>
1565	   #include <math.h>

1567	   /* Returns the raw A^2 test statistic for n sorted samples
1568	    * z[0] .. z[n-1], for z ~ Unif(0,1).
1569	    */
1570	   extern double compute_A2(double z[], int n);

1572	   /* Returns the significance level associated with a A^2 test
1573	    * statistic value of A2, assuming no parameters of the tested
1574	    * distribution were estimated from the data.
1575	    */
1576	   extern double A2_significance(double A2);

1578	   /* Returns the A^2 significance level for testing n observations
1579	    * x[0] .. x[n-1] against an exponential distribution with the
1580	    * given mean.
1581	    *
1582	    * SIDE EFFECT: the x[0..n-1] are sorted upon return.
1583	    */
1584	   extern double exp_A2_known_mean(double x[], int n, double mean);

1586	   /* Returns the A^2 significance level for testing n observations
1587	    * x[0] .. x[n-1] against the uniform distribution [min_val, max_val].

1589	ID                Framework for IP Performance Metrics     February 1998

1591	    *
1592	    * SIDE EFFECT: the x[0..n-1] are sorted upon return.
1593	    */
1594	   extern double unif_A2_known_range(double x[], int n,
1595	                       double min_val, double max_val);

1597	   /* Returns a pseudo-random number distributed according to an
1598	    * exponential distribution with the given mean.
1599	    */
1600	   extern double random_exponential(double mean);

1602	   /* Helper function used by qsort() to sort double-precision
1603	    * floating-point values.
1604	    */
1605	   static int
1606	   compare_double(const void *v1, const void *v2)
1607	   {
1608	       double d1 = *(double *) v1;
1609	       double d2 = *(double *) v2;

1611	       if (d1 < d2)
1612	           return -1;
1613	       else if (d1 > d2)
1614	           return 1;
1615	       else
1616	           return 0;
1617	   }

1619	   double
1620	   compute_A2(double z[], int n)
1621	   {
1622	       int i;
1623	       double sum = 0.0;

1625	       if ( n < 5 )
1626	           /* Too few values. */
1627	           return -1.0;

1629	       /* If any of the values are outside the range (0, 1) then
1630	        * fail immediately (and avoid a possible floating point
1631	        * exception in the code below).
1632	        */
1633	       for (i = 0; i < n; ++i)
1634	           if ( z[i] <= 0.0 || z[i] >= 1.0 )
1635	               return -1.0;

1637	       /* Page 101 of D'Agostino and Stephens. */

1639	ID                Framework for IP Performance Metrics     February 1998

1641	       for (i = 1; i <= n; ++i) {
1642	           sum += (2 * i - 1) * log(z[i-1]);
1643	           sum += (2 * n + 1 - 2 * i) * log(1.0 - z[i-1]);
1644	       }
1645	       return -n - (1.0 / n) * sum;
1646	   }

1648	   double
1649	   A2_significance(double A2)
1650	   {
1651	       /* Page 105 of D'Agostino and Stephens. */
1652	       if (A2 < 0.0)
1653	           return A2;    /* Bogus A2 value - propagate it. */

1655	       /* Check for possibly doctored values. */
1656	       if (A2 <= 0.201)
1657	           return 0.99;
1658	       else if (A2 <= 0.240)
1659	           return 0.975;
1660	       else if (A2 <= 0.283)
1661	           return 0.95;
1662	       else if (A2 <= 0.346)
1663	           return 0.90;
1664	       else if (A2 <= 0.399)
1665	           return 0.85;

1667	       /* Now check for possible inconsistency. */
1668	       if (A2 <= 1.248)
1669	           return 0.25;
1670	       else if (A2 <= 1.610)
1671	           return 0.15;
1672	       else if (A2 <= 1.933)
1673	           return 0.10;
1674	       else if (A2 <= 2.492)
1675	           return 0.05;
1676	       else if (A2 <= 3.070)
1677	           return 0.025;
1678	       else if (A2 <= 3.880)
1679	           return 0.01;
1680	       else if (A2 <= 4.500)
1681	           return 0.005;
1682	       else if (A2 <= 6.000)
1683	           return 0.001;
1684	       else
1685	           return 0.0;
1686	   }

1688	   double

1690	ID                Framework for IP Performance Metrics     February 1998

1692	   exp_A2_known_mean(double x[], int n, double mean)
1693	   {
1694	       int i;
1695	       double A2;

1697	       /* Sort the first n values. */
1698	       qsort(x, n, sizeof(x[0]), compare_double);

1700	       /* Assuming they match an exponential distribution, transform
1701	        * them to Unif(0,1).
1702	        */
1703	       for (i = 0; i < n; ++i) {
1704	           x[i] = 1.0 - exp(-x[i] / mean);
1705	       }

1707	       /* Now make the A^2 test to see if they're truly uniform. */
1708	       A2 = compute_A2(x, n);
1709	       return A2_significance(A2);
1710	   }

1712	   double
1713	   unif_A2_known_range(double x[], int n, double min_val, double max_val)
1714	   {
1715	       int i;
1716	       double A2;
1717	       double range = max_val - min_val;

1719	       /* Sort the first n values. */
1720	       qsort(x, n, sizeof(x[0]), compare_double);

1722	       /* Transform Unif(min_val, max_val) to Unif(0,1). */
1723	       for (i = 0; i < n; ++i)
1724	           x[i] = (x[i] - min_val) / range;

1726	       /* Now make the A^2 test to see if they're truly uniform. */
1727	       A2 = compute_A2(x, n);
1728	       return A2_significance(A2);
1729	   }

1731	   double
1732	   random_exponential(double mean)
1733	   {
1734	       return -mean * log1p(-drand48());
1735	   }

1737	ID                Framework for IP Performance Metrics     February 1998

1739	18. References

1741	   [AK96] G. Almes and S. Kalidindi, "A One-way Delay Metric for  IPPM",
1742	   work in progress, November 1997.

1744	   [BM92] I. Bilinskis and A. Mikelsons, Randomized  Signal  Processing,
1745	   Prentice Hall International, 1992.

1747	   [DS86] R. D'Agostino and M. Stephens, editors, Goodness-of-Fit  Tech-
1748	   niques, Marcel Dekker, Inc., 1986.

1750	   [CPB93] K. Claffy, G. Polyzos, and H-W. Braun, ``Application of  Sam-
1751	   pling  Methodologies  to  Network  Traffic  Characterization,'' Proc.
1752	   SIGCOMM '93, pp. 194-203, San Francisco, September 1993.

1754	   [FJ94] S. Floyd and V. Jacobson, ``The  Synchronization  of  Periodic
1755	   Routing  Messages,''  IEEE/ACM  Transactions on Networking, 2(2), pp.
1756	   122-136, April 1994.

1758	   [Mi92] D. Mills, "Network Time Protocol  (Version  3)  Specification,
1759	   Implementation and Analysis", RFC 1305, March 1992.

1761	   [Pa94] V. Paxson, ``Empirically-Derived Analytic Models of  Wide-Area
1762	   TCP  Connections,''  IEEE/ACM  Transactions  on Networking, 2(4), pp.
1763	   316-336, August 1994.

1765	   [Pa96] V. Paxson, ``Towards a Framework for Defining Internet Perfor-
1766	   mance       Metrics,''       Proceedings       of      INET      '96,
1767	   ftp://ftp.ee.lbl.gov/papers/metrics-framework-INET96.ps.Z

1769	   [Pa97] V. Paxson, ``Measurements and Analysis of End-to-End  Internet
1770	   Dynamics,''     Ph.D.     dissertation,    U.C.    Berkeley,    1997,
1771	   ftp://ftp.ee.lbl.gov/papers/vp-thesis/dis.ps.gz.

1773	19. Authors' Addresses

1775	   Vern Paxson <vern@ee.lbl.gov>
1776	   MS 50B/2239
1777	   Lawrence Berkeley National Laboratory
1778	   University of California
1779	   Berkeley, CA  94720
1780	   USA
1781	   Phone: +1 510/486-7504

1783	   Guy Almes <almes@advanced.org>
1784	   Advanced Network & Services, Inc.
1785	   200 Business Park Drive

1787	ID                Framework for IP Performance Metrics     February 1998

1789	   Armonk, NY  10504
1790	   USA
1791	   Phone: +1 914/273-7863

1793	   Jamshid Mahdavi <mahdavi@psc.edu>
1794	   Pittsburgh Supercomputing Center
1795	   4400 5th Avenue
1796	   Pittsburgh, PA  15213
1797	   USA
1798	   Phone: +1 412/268-6282

1800	   Matt Mathis <mathis@psc.edu>
1801	   Pittsburgh Supercomputing Center
1802	   4400 5th Avenue
1803	   Pittsburgh, PA  15213
1804	   USA
1805	   Phone: +1 412/268-3319