idnits 2.17.1 

draft-ietf-ippm-model-based-metrics-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 1061 has weird spacing: '...   and  n = h1...'

  -- The document date (March 9, 2015) is 3334 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Missing Reference: 'Dominant' is mentioned on line 257, but not defined

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)

  -- Obsolete informational reference (is this intentional?): RFC 2861
     (Obsoleted by RFC 7661)


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	IP Performance Working Group                                   M. Mathis
3	Internet-Draft                                               Google, Inc
4	Intended status: Experimental                                  A. Morton
5	Expires: September 10, 2015                                    AT&T Labs
6	                                                           March 9, 2015

8	                  Model Based Bulk Performance Metrics
9	               draft-ietf-ippm-model-based-metrics-04.txt

11	Abstract

13	   We introduce a new class of model based metrics designed to determine
14	   if an end-to-end Internet path can meet predefined bulk transport
15	   performance targets by applying a suite of IP diagnostic tests to
16	   successive subpaths.  The subpath-at-a-time tests can be robustly
17	   applied to key infrastructure, such as interconnects, to accurately
18	   detect if any part of the infrastructure will prevent the full end-
19	   to-end paths traversing them from meeting the specified target
20	   performance.

22	   The diagnostic tests consist of precomputed traffic patterns and
23	   statistical criteria for evaluating packet delivery.  The traffic
24	   patterns are precomputed to mimic TCP or other transport protocol
25	   over a long path but are constructed in such a way that they are
26	   independent of the actual details of the subpath under test, end
27	   systems or applications.  Likewise the success criteria depends on
28	   the packet delivery statistics of the subpath, as evaluated against a
29	   protocol model applied to the target performance.  The success
30	   criteria also does not depend on the details of the subpath,
31	   endsystems or application.  This makes the measurements open loop,
32	   eliminating most of the difficulties encountered by traditional bulk
33	   transport metrics.

35	   Model based metrics exhibit several important new properties not
36	   present in other Bulk Capacity Metrics, including the ability to
37	   reason about concatenated or overlapping subpaths.  The results are
38	   vantage independent which is critical for supporting independent
39	   validation of tests results from multiple Measurement Points.

41	   This document does not define diagnostic tests directly, but provides
42	   a framework for designing suites of diagnostics tests that are
43	   tailored to confirming that infrastructure can meet the target
44	   performance.

46	Status of this Memo

48	   This Internet-Draft is submitted in full conformance with the
49	   provisions of BCP 78 and BCP 79.

51	   Internet-Drafts are working documents of the Internet Engineering
52	   Task Force (IETF).  Note that other groups may also distribute
53	   working documents as Internet-Drafts.  The list of current Internet-
54	   Drafts is at http://datatracker.ietf.org/drafts/current/.

56	   Internet-Drafts are draft documents valid for a maximum of six months
57	   and may be updated, replaced, or obsoleted by other documents at any
58	   time.  It is inappropriate to use Internet-Drafts as reference
59	   material or to cite them other than as "work in progress."

61	   This Internet-Draft will expire on September 10, 2015.

63	Copyright Notice

65	   Copyright (c) 2015 IETF Trust and the persons identified as the
66	   document authors.  All rights reserved.

68	   This document is subject to BCP 78 and the IETF Trust's Legal
69	   Provisions Relating to IETF Documents
70	   (http://trustee.ietf.org/license-info) in effect on the date of
71	   publication of this document.  Please review these documents
72	   carefully, as they describe your rights and restrictions with respect
73	   to this document.  Code Components extracted from this document must
74	   include Simplified BSD License text as described in Section 4.e of
75	   the Trust Legal Provisions and are provided without warranty as
76	   described in the Simplified BSD License.

78	Table of Contents

80	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
81	     1.1.  TODO . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
82	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  7
83	   3.  New requirements relative to RFC 2330  . . . . . . . . . . . . 11
84	   4.  Background . . . . . . . . . . . . . . . . . . . . . . . . . . 12
85	     4.1.  TCP properties . . . . . . . . . . . . . . . . . . . . . . 13
86	     4.2.  Diagnostic Approach  . . . . . . . . . . . . . . . . . . . 14
87	   5.  Common Models and Parameters . . . . . . . . . . . . . . . . . 15
88	     5.1.  Target End-to-end parameters . . . . . . . . . . . . . . . 16
89	     5.2.  Common Model Calculations  . . . . . . . . . . . . . . . . 16
90	     5.3.  Parameter Derating . . . . . . . . . . . . . . . . . . . . 17
91	   6.  Common testing procedures  . . . . . . . . . . . . . . . . . . 18
92	     6.1.  Traffic generating techniques  . . . . . . . . . . . . . . 18
93	       6.1.1.  Paced transmission . . . . . . . . . . . . . . . . . . 18
94	       6.1.2.  Constant window pseudo CBR . . . . . . . . . . . . . . 19
95	       6.1.3.  Scanned window pseudo CBR  . . . . . . . . . . . . . . 19
96	       6.1.4.  Concurrent or channelized testing  . . . . . . . . . . 20
97	     6.2.  Interpreting the Results . . . . . . . . . . . . . . . . . 21
98	       6.2.1.  Test outcomes  . . . . . . . . . . . . . . . . . . . . 21
99	       6.2.2.  Statistical criteria for estimating run_length . . . . 22
100	       6.2.3.  Reordering Tolerance . . . . . . . . . . . . . . . . . 24
101	     6.3.  Test Preconditions . . . . . . . . . . . . . . . . . . . . 25
102	   7.  Diagnostic Tests . . . . . . . . . . . . . . . . . . . . . . . 25
103	     7.1.  Basic Data Rate and Delivery Statistics Tests  . . . . . . 26
104	       7.1.1.  Delivery Statistics at Paced Full Data Rate  . . . . . 26
105	       7.1.2.  Delivery Statistics at Full Data Windowed Rate . . . . 27
106	       7.1.3.  Background Delivery Statistics Tests . . . . . . . . . 27
107	     7.2.  Standing Queue Tests . . . . . . . . . . . . . . . . . . . 27
108	       7.2.1.  Congestion Avoidance . . . . . . . . . . . . . . . . . 29
109	       7.2.2.  Bufferbloat  . . . . . . . . . . . . . . . . . . . . . 29
110	       7.2.3.  Non excessive loss . . . . . . . . . . . . . . . . . . 30
111	       7.2.4.  Duplex Self Interference . . . . . . . . . . . . . . . 30
112	     7.3.  Slowstart tests  . . . . . . . . . . . . . . . . . . . . . 30
113	       7.3.1.  Full Window slowstart test . . . . . . . . . . . . . . 31
114	       7.3.2.  Slowstart AQM test . . . . . . . . . . . . . . . . . . 31
115	     7.4.  Sender Rate Burst tests  . . . . . . . . . . . . . . . . . 31
116	     7.5.  Combined and Implicit Tests  . . . . . . . . . . . . . . . 32
117	       7.5.1.  Sustained Bursts Test  . . . . . . . . . . . . . . . . 32
118	       7.5.2.  Streaming Media  . . . . . . . . . . . . . . . . . . . 33
119	   8.  An Example . . . . . . . . . . . . . . . . . . . . . . . . . . 34
120	   9.  Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 36
121	   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 37
122	   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37
123	   12. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 38
124	   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38
125	     13.1. Normative References . . . . . . . . . . . . . . . . . . . 38
126	     13.2. Informative References . . . . . . . . . . . . . . . . . . 38
127	   Appendix A.  Model Derivations . . . . . . . . . . . . . . . . . . 40
128	     A.1.  Queueless Reno . . . . . . . . . . . . . . . . . . . . . . 41
129	   Appendix B.  Complex Queueing  . . . . . . . . . . . . . . . . . . 42
130	   Appendix C.  Version Control . . . . . . . . . . . . . . . . . . . 43
131	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43

133	1.  Introduction

135	   Bulk performance metrics evaluate an Internet path's ability to carry
136	   bulk data.  Model based bulk performance metrics rely on mathematical
137	   TCP models to design a targeted diagnostic suite (TDS) of IP
138	   performance tests which can be applied independently to each subpath
139	   of the full end-to-end path.  These targeted diagnostic suites allow
140	   independent tests of subpaths to accurately detect if any subpath
141	   will prevent the full end-to-end path from delivering bulk data at
142	   the specified performance target, independent of the measurement
143	   vantage points or other details of the test procedures used for each
144	   measurement.

146	   The end-to-end target performance is determined by the needs of the
147	   user or application, outside the scope of this document.  For bulk
148	   data transport, the primary performance parameter of interest is the
149	   target data rate.  However, since TCP's ability to compensate for
150	   less than ideal network conditions is fundamentally affected by the
151	   Round Trip Time (RTT) and the Maximum Transmission Unit (MTU) of the
152	   entire end-to-end path over which the data traverses, these
153	   parameters must also be specified in advance.  They may reflect a
154	   specific real path through the Internet or an idealized path
155	   representing a typical user community.  The target values for these
156	   three parameters, Data Rate, RTT and MTU, inform the mathematical
157	   models used to design the TDS.

159	   Each IP diagnostic test in a TDS consists of a precomputed traffic
160	   pattern and statistical criteria for evaluating packet delivery.

162	   Mathematical models are used to design traffic patterns that mimic
163	   TCP or other bulk transport protocol operating at the target data
164	   rate, MTU and RTT over a full range of conditions, including flows
165	   that are bursty at multiple time scales.  The traffic patterns are
166	   computed in advance based on the three target parameters of the end-
167	   to-end path and independent of the properties of individual subpaths.
168	   As much as possible the measurement traffic is generated
169	   deterministically in ways that minimize the extent to which test
170	   methodology, measurement points, measurement vantage or path
171	   partitioning affect the details of the measurement traffic.

173	   Mathematical models are also used to compute the bounds on the packet
174	   delivery statistics for acceptable IP performance.  Since these
175	   statistics, such as packet loss, are typically aggregated from all
176	   subpaths of the end-to-end path, the end-to-end statistical bounds
177	   need to be apportioned as a separate bound for each subpath.  Note
178	   that links that are expected to be bottlenecks are expected to
179	   contribute a larger fraction of the total packet loss and/or delay.
180	   In compensation, other links have to be constrained to contribute
181	   less packet loss and delay.  The criteria for passing each test of a
182	   TDS is an apportioned share of the total bound determined by the
183	   mathematical model from the end-to-end target performance.

185	   In addition to passing or failing, a test can be deemed to be
186	   inconclusive for a number of reasons including: the precomputed
187	   traffic pattern was not accurately generated; the measurement results
188	   were not statistically significant; and others such as failing to
189	   meet some required test preconditions.

191	   This document describes a framework for deriving traffic patterns and
192	   delivery statistics for model based metrics.  It does not fully
193	   specify any measurement techniques.  Important details such as packet
194	   type-p selection, sampling techniques, vantage selection, etc. are
195	   not specified here.  We imagine Fully Specified Targeted Diagnostic
196	   Suites (FSTDS), that define all of these details.  We use TDS to
197	   refer to the subset of such a specification that is in scope for this
198	   document.  A TDS includes the target parameters, documentation of the
199	   models and assumptions used to derive the diagnostic test parameters,
200	   specifications for the traffic and delivery statistics for the tests
201	   themselves, and a description of a test setup that can be used to
202	   validate the tests and models.

204	   Section 2 defines terminology used throughout this document.

206	   It has been difficult to develop Bulk Transport Capacity [RFC3148]
207	   metrics due to some overlooked requirements described in Section 3
208	   and some intrinsic problems with using protocols for measurement,
209	   described in Section 4.

211	   In Section 5 we describe the models and common parameters used to
212	   derive the targeted diagnostic suite.  In Section 6 we describe
213	   common testing procedures.  Each subpath is evaluated using suite of
214	   far simpler and more predictable diagnostic tests described in
215	   Section 7.  In Section 8 we present an example TDS that might be
216	   representative of HD video, and illustrate how MBM can be used to
217	   address difficult measurement situations, such as confirming that
218	   intercarrier exchanges have sufficient performance and capacity to
219	   deliver HD video between ISPs.

221	   There exists a small risk that model based metric itself might yield
222	   a false pass result, in the sense that every subpath of an end-to-end
223	   path passes every IP diagnostic test and yet a real application fails
224	   to attain the performance target over the end-to-end path.  If this
225	   happens, then the validation procedure described in Section 9 needs
226	   to be used to prove and potentially revise the models.

228	   Future documents may define model based metrics for other traffic
229	   classes and application types, such as real time streaming media.

231	1.1.  TODO

233	   This section to be removed prior to publication.

235	   Please send comments about this draft to ippm@ietf.org.  See
236	   http://goo.gl/02tkD for more information including: interim drafts,
237	   an up to date todo list and information on contributing.

239	   Formatted: Mon Mar 9 14:37:24 PDT 2015

241	2.  Terminology

243	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
244	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
245	   document are to be interpreted as described in [RFC2119].

247	   Terminology about paths, etc.  See [RFC2330] and [RFC7398].

249	   [data] sender:  Host sending data and receiving ACKs.
250	   [data] receiver:  Host receiving data and sending ACKs.
251	   subpath:  A portion of the full path.  Note that there is no
252	      requirement that subpaths be non-overlapping.
253	   Measurement Point:  Measurement points as described in [RFC7398].
254	   test path:  A path between two measurement points that includes a
255	      subpath of the end-to-end path under test, and could include
256	      infrastructure between the measurement points and the subpath.
257	   [Dominant] Bottleneck:  The Bottleneck that generally dominates
258	      traffic statistics for the entire path.  It typically determines a
259	      flow's self clock timing, packet loss and ECN marking rate.  See
260	      Section 4.1.
261	   front path:  The subpath from the data sender to the dominant
262	      bottleneck.
263	   back path:  The subpath from the dominant bottleneck to the receiver.
264	   return path:  The path taken by the ACKs from the data receiver to
265	      the data sender.
266	   cross traffic:  Other, potentially interfering, traffic competing for
267	      network resources (bandwidth and/or queue capacity).

269	   Properties determined by the end-to-end path and application.  They
270	   are described in more detail in Section 5.1.

272	   Application Data Rate:  General term for the data rate as seen by the
273	      application above the transport layer.  This is the payload data
274	      rate, and excludes transport and lower level headers(TCP/IP or
275	      other protocols) and as well as retransmissions and other data
276	      that does not contribute to the total quantity of data delivered
277	      to the application.
278	   Link Data Rate:  General term for the data rate as seen by the link
279	      or lower layers.  The link data rate includes transport and IP
280	      headers, retransmissions and other transport layer overhead.  This
281	      document is agnostic as to whether the link data rate includes or
282	      excludes framing, MAC, or other lower layer overheads, except that
283	      they must be treated uniformly.
284	   end-to-end target parameters:  Application or transport performance
285	      goals for the end-to-end path.  They include the target data rate,
286	      RTT and MTU described below.
287	   Target Data Rate:  The application data rate, typically the ultimate
288	      user's performance goal.
289	   Target RTT (Round Trip Time):  The baseline (minimum) RTT of the
290	      longest end-to-end path over which the application expects to be
291	      able meet the target performance.  TCP and other transport
292	      protocol's ability to compensate for path problems is generally
293	      proportional to the number of round trips per second.  The Target
294	      RTT determines both key parameters of the traffic patterns (e.g.
295	      burst sizes) and the thresholds on acceptable traffic statistics.
296	      The Target RTT must be specified considering authentic packets
297	      sizes: MTU sized packets on the forward path, ACK sized packets
298	      (typically header_overhead) on the return path.
299	   Target MTU (Maximum Transmission Unit):  The maximum MTU supported by
300	      the end-to-end path the over which the application expects to meet
301	      the target performance.  Assume 1500 Byte packet unless otherwise
302	      specified.  If some subpath forces a smaller MTU, then it becomes
303	      the target MTU, and all model calculations and subpath tests must
304	      use the same smaller MTU.
305	   Effective Bottleneck Data Rate:  This is the bottleneck data rate
306	      inferred from the ACK stream, by looking at how much data the ACK
307	      stream reports delivered per unit time.  If the path is thinning
308	      ACKs or batching packets the effective bottleneck rate can be much
309	      higher than the average link rate.  See Section 4.1 and Appendix B
310	      for more details.
311	   [sender | interface] rate:  The burst data rate, constrained by the
312	      data sender's interfaces.  Today 1 or 10 Gb/s are typical.
313	   Header_overhead:  The IP and TCP header sizes, which are the portion
314	      of each MTU not available for carrying application payload.
315	      Without loss of generality this is assumed to be the size for
316	      returning acknowledgements (ACKs).  For TCP, the Maximum Segment
317	      Size (MSS) is the Target MTU minus the header_overhead.

319	   Basic parameters common to models and subpath tests.  They are
320	   described in more detail in Section 5.2.  Note that these are mixed
321	   between application transport performance (excludes headers) and link
322	   IP performance (includes headers).

324	   pipe size:  A general term for number of packets needed in flight
325	      (the window size) to exactly fill some network path or subpath.
326	      This is the window size which is normally the onset of queueing.
327	   target_pipe_size:  The number of packets in flight (the window size)
328	      needed to exactly meet the target rate, with a single stream and
329	      no cross traffic for the specified application target data rate,
330	      RTT, and MTU.  It is the amount of circulating data required to
331	      meet the target data rate, and implies the scale of the bursts
332	      that the network might experience.
333	   run length:  A general term for the observed, measured, or specified
334	      number of packets that are (to be) delivered between losses or ECN
335	      marks.  Nominally one over the loss or ECN marking probability, if
336	      there are independently and identically distributed.
337	   target_run_length:  The target_run_length is an estimate of the
338	      minimum number of good packets needed between losses or ECN marks
339	      necessary to attain the target_data_rate over a path with the
340	      specified target_RTT and target_MTU, as computed by a mathematical
341	      model of TCP congestion control.  A reference calculation is shown
342	      in Section 5.2 and alternatives in Appendix A

344	   Ancillary parameters used for some tests

346	   derating:  Under some conditions the standard models are too
347	      conservative.  The modeling framework permits some latitude in
348	      relaxing or "derating" some test parameters as described in
349	      Section 5.3 in exchange for a more stringent TDS validation
350	      procedures, described in Section 9.
351	   subpath_data_rate:  The maximum IP data rate supported by a subpath.
352	      This typically includes TCP/IP overhead, including headers,
353	      retransmits, etc.
354	   test_path_RTT:  The RTT between two measurement points using
355	      appropriate data and ACK packet sizes.
356	   test_path_pipe:  The amount of data necessary to fill a test path.
357	      Nominally the test path RTT times the subpath_data_rate (which
358	      should be part of the end-to-end subpath).
359	   test_window:  The window necessary to meet the target_rate over a
360	      subpath.  Typically test_window=target_data_rate*test_RTT/
361	      (target_MTU - header_overhead).

363	   Tests can be classified into groups according to their applicability.

365	   Capacity tests:  determine if a network subpath has sufficient
366	      capacity to deliver the target performance.  As long as the test
367	      traffic is within the proper envelope for the target end-to-end
368	      performance, the average packet losses or ECN marks must be below
369	      the threshold computed by the model.  As such, capacity tests
370	      reflect parameters that can transition from passing to failing as
371	      a consequence of cross traffic, additional presented load or the
372	      actions of other network users.  By definition, capacity tests
373	      also consume significant network resources (data capacity and/or
374	      buffer space), and the test schedules must be balanced by their
375	      cost.
376	   Monitoring tests:  are designed to capture the most important aspects
377	      of a capacity test, but without presenting excessive ongoing load
378	      themselves.  As such they may miss some details of the network's
379	      performance, but can serve as a useful reduced-cost proxy for a
380	      capacity test.
381	   Engineering tests:  evaluate how network algorithms (such as AQM and
382	      channel allocation) interact with TCP-style self clocked protocols
383	      and adaptive congestion control based on packet loss and ECN
384	      marks.  These tests are likely to have complicated interactions
385	      with cross traffic and under some conditions can be inversely
386	      sensitive to load.  For example a test to verify that an AQM
387	      algorithm causes ECN marks or packet drops early enough to limit
388	      queue occupancy may experience a false pass result in the presence
389	      of cross traffic.  It is important that engineering tests be
390	      performed under a wide range of conditions, including both in situ
391	      and bench testing, and over a wide variety of load conditions.
392	      Ongoing monitoring is less likely to be useful for engineering
393	      tests, although sparse in situ testing might be appropriate.

395	   General Terminology:

397	   Targeted Diagnostic Test (TDS):  A set of IP Diagnostics designed to
398	      determine if a subpath can sustain flows at a specific
399	      target_data_rate over a path that has a target_RTT using
400	      target_MTU sided packets.
401	   Fully Specified Targeted Diagnostic Test:  A TDS together with
402	      additional specification such as "type-p", etc which are out of
403	      scope for this document, but need to be drawn from other standards
404	      documents.
405	   apportioned:  To divide and allocate, as in budgeting packet loss
406	      rates across multiple subpaths to accumulate below a specified
407	      end-to-end loss rate.
408	   open loop:  A control theory term used to describe a class of
409	      techniques where systems that naturally exhibit circular
410	      dependencies can be analyzed by suppressing some of the
411	      dependences, such that the resulting dependency graph is acyclic.

413	   Bulk performance metrics:  Bulk performance metrics evaluate an
414	      Internet path's ability to carry bulk data, such as transporting
415	      large files, streaming (non-real time) video, and at some scales,
416	      web images and content.  (For very fast network, web performance
417	      is dominated by pure RTT effects).  The metrics presented in this
418	      document reflect the evolution of [RFC3148].
419	   traffic patterns:  The temporal patterns or statistics of traffic
420	      generated by applications over transport protocols such as TCP.
421	      There are several mechanisms that cause bursts at various time
422	      scales.  Our goal here is to mimic the range of common patterns
423	      (burst sizes and rates, etc), without tieing our applicability to
424	      specific applications, implementations or technologies, which are
425	      sure to become stale.
426	   delivery Statistics:  Raw or summary statistics about packet delivery
427	      properties of the IP layer including packet losses, ECN marks,
428	      reordering, or any other properties that may be germane to
429	      transport performance.
430	   IP performance tests:  Measurements or diagnostic tests to determine
431	      delivery statistics.

433	3.  New requirements relative to RFC 2330

435	   Model Based Metrics are designed to fulfill some additional
436	   requirement that were not recognized at the time RFC 2330 was written
437	   [RFC2330].  These missing requirements may have significantly
438	   contributed to policy difficulties in the IP measurement space.  Some
439	   additional requirements are:
440	   o  IP metrics must be actionable by the ISP - they have to be
441	      interpreted in terms of behaviors or properties at the IP or lower
442	      layers, that an ISP can test, repair and verify.
443	   o  Metrics should be spatially composable, such that measures of
444	      concatenated paths should be predictable from subpaths.  Ideally
445	      they should also be differentiable: the metrics of a subpath
446	      should be
447	   o  Metrics must be vantage point invariant over a significant range
448	      of measurement point choices, including off path measurement
449	      points.  The only requirements on MP selection should be that the
450	      portion of the test path that is not under test between the MP and
451	      the part that under tests is effectively ideal, or is non ideal in
452	      ways that can be calibrated out of the measurements and the test
453	      RTT between the MPs is below some reasonable bound.
454	   o  Metrics must be repeatable by multiple parties with no specialized
455	      access to MPs or diagnostic infrastructure.  It must be possible
456	      for different parties to make the same measurement and observe the
457	      same results.  In particular it is specifically important that
458	      both a consumer (or their delegate) and ISP be able to perform the
459	      same measurement and get the same result.  Note that vantage
460	      independence is key to this requirement.

462	4.  Background

464	   At the time the IPPM WG was chartered, sound Bulk Transport Capacity
465	   measurement was known to be way beyond our capabilities.  By
466	   hindsight it is now clear why it is such a hard problem:
467	   o  TCP is a control system with circular dependencies - everything
468	      affects performance, including components that are explicitly not
469	      part of the test.
470	   o  Congestion control is an equilibrium process, such that transport
471	      protocols change the network (raise loss probability and/or RTT)
472	      to conform to their behavior.
473	   o  TCP's ability to compensate for network flaws is directly
474	      proportional to the number of roundtrips per second (i.e.
475	      inversely proportional to the RTT).  As a consequence a flawed
476	      link may pass a short RTT local test even though it fails when the
477	      path is extended by a perfect network to some larger RTT.
478	   o  TCP has a meta Heisenberg problem - Measurement and cross traffic
479	      interact in unknown and ill defined ways.  The situation is
480	      actually worse than the traditional physics problem where you can
481	      at least estimate bounds on the relative momentum of the
482	      measurement and measured particles.  For network measurement you
483	      can not in general determine the relative "elasticity" of the
484	      measurement traffic and cross traffic, so you can not even gauge
485	      the relative magnitude of their effects on each other.

487	   These properties are a consequence of the equilibrium behavior
488	   intrinsic to how all throughput optimizing protocols interact with
489	   the Internet.  The protocols rely on control systems based on
490	   multiple network estimators to regulate the quantity of data traffic
491	   sent into the network.  The data traffic in turn alters network and
492	   the properties observed by the estimators, such that there are
493	   circular dependencies between every component and every property.
494	   Since some of these properties are non-linear, the entire system is
495	   nonlinear, and any change anywhere causes difficult to predict
496	   changes in every parameter.

498	   Model Based Metrics overcome these problems by forcing the
499	   measurement system to be open loop: the delivery statistics (akin to
500	   the network estimators) do not affect the traffic or traffic patterns
501	   (bursts), which computed on the basis of the target performance.  In
502	   order for a network to pass, the resulting delivery statistics and
503	   corresponding network estimators have to be such that they would not
504	   cause the control systems slow the traffic below the target rate.

506	4.1.  TCP properties

508	   TCP and SCTP are self clocked protocols.  The dominant steady state
509	   behavior is to have an approximately fixed quantity of data and
510	   acknowledgements (ACKs) circulating in the network.  The receiver
511	   reports arriving data by returning ACKs to the data sender, the data
512	   sender typically responds by sending exactly the same quantity of
513	   data back into the network.  The total quantity of data plus the data
514	   represented by ACKs circulating in the network is referred to as the
515	   window.  The mandatory congestion control algorithms incrementally
516	   adjust the window by sending slightly more or less data in response
517	   to each ACK.  The fundamentally important property of this systems is
518	   that it is entirely self clocked: The data transmissions are a
519	   reflection of the ACKs that were delivered by the network, the ACKs
520	   are a reflection of the data arriving from the network.

522	   A number of phenomena can cause bursts of data, even in idealized
523	   networks that are modeled as simple queueing systems.

525	   During slowstart the data rate is doubled on each RTT by sending
526	   twice as much data as was delivered to the receiver on the prior RTT.
527	   For slowstart to be able to fill such a network the network must be
528	   able to tolerate slowstart bursts up to the full pipe size inflated
529	   by the anticipated window reduction on the first loss or ECN mark.
530	   For example, with classic Reno congestion control, an optimal
531	   slowstart has to end with a burst that is twice the bottleneck rate
532	   for exactly one RTT in duration.  This burst causes a queue which is
533	   exactly equal to the pipe size (i.e. the window is exactly twice the
534	   pipe size) so when the window is halved in response to the first
535	   loss, the new window will be exactly the pipe size.

537	   Note that if the bottleneck data rate is significantly slower than
538	   the rest of the path, the slowstart bursts will not cause significant
539	   queues anywhere else along the path; they primarily exercise the
540	   queue at the dominant bottleneck.

542	   Other sources of bursts include application pauses and channel
543	   allocation mechanisms.  Appendix B describes the treatment of channel
544	   allocation systems.  If the application pauses (stops reading or
545	   writing data) for some fraction of one RTT, state-of-the-art TCP
546	   catches up to the earlier window size by sending a burst of data at
547	   the full sender interface rate.  To fill such a network with a
548	   realistic application, the network has to be able to tolerate
549	   interface rate bursts from the data sender large enough to cover
550	   application pauses.

552	   Although the interface rate bursts are typically smaller than last
553	   burst of a slowstart, they are at a higher data rate so they
554	   potentially exercise queues at arbitrary points along the front path
555	   from the data sender up to and including the queue at the dominant
556	   bottleneck.  There is no model for how frequent or what sizes of
557	   sender rate bursts should be tolerated.

559	   To verify that a path can meet a performance target, it is necessary
560	   to independently confirm that the path can tolerate bursts in the
561	   dimensions that can be caused by these mechanisms.  Three cases are
562	   likely to be sufficient:

564	   o  Slowstart bursts sufficient to get connections started properly.
565	   o  Frequent sender interface rate bursts that are small enough where
566	      they can be assumed not to significantly affect delivery
567	      statistics.  (Implicitly derated by selecting the burst size).
568	   o  Infrequent sender interface rate full target_pipe_size bursts that
569	      do affect the delivery statistics.  (Target_run_length may be
570	      derated).

572	4.2.  Diagnostic Approach

574	   The MBM approach is to open loop TCP by precomputing traffic patterns
575	   that are typically generated by TCP operating at the given target
576	   parameters, and evaluating delivery statistics (packet loss, ECN
577	   marks and delay).  In this approach the measurement software
578	   explicitly controls the data rate, transmission pattern or cwnd
579	   (TCP's primary congestion control state variables) to create
580	   repeatable traffic patterns that mimic TCP behavior but are
581	   independent of the actual behavior of the subpath under test.  These
582	   patterns are manipulated to probe the network to verify that it can
583	   deliver all of the traffic patterns that a transport protocol is
584	   likely to generate under normal operation at the target rate and RTT.

586	   By opening the protocol control loops, we remove most sources of
587	   temporal and spatial correlation in the traffic delivery statistics,
588	   such that each subpath's contribution to the end-to-end statistics
589	   can be assumed to be independent and stationary (The delivery
590	   statistics depend on the fine structure of the data transmissions,
591	   but not on long time scale state imbedded in the sender, receiver or
592	   other network components.)  Therefore each subpath's contribution to
593	   the end-to-end delivery statistics can be assumed to be independent,
594	   and spatial composition techniques such as [RFC5835] and [RFC6049]
595	   apply.

597	   In typical networks, the dominant bottleneck contributes the majority
598	   of the packet loss and ECN marks.  Often the rest of the path makes
599	   insignificant contribution to these properties.  A TDS should
600	   apportion the end-to-end budget for the specified parameters
601	   (primarily packet loss and ECN marks) to each subpath or group of
602	   subpaths.  For example the dominant bottleneck may be permitted to
603	   contribute 90% of the loss budget, while the rest of the path is only
604	   permitted to contribute 10%.

606	   A TDS or FSTDS MUST apportion all relevant packet delivery statistics
607	   between successive subpaths, such that the spatial composition of the
608	   apportioned metrics will yield end-to-end statics which are within
609	   the bounds determined by the models.

611	   A network is expected to be able to sustain a Bulk TCP flow of a
612	   given data rate, MTU and RTT when all of the following conditions are
613	   met:
614	   1.  The raw link rate is higher than the target data rate.  See
615	       Section 7.1 or any number of data rate tests outside of MBM.
616	   2.  The observed packet delivery statistics are better than required
617	       by a suitable TCP performance model (e.g. fewer losses or ECN
618	       marks).  See Section 7.1 or any number of low rate packet loss
619	       tests outside of MBM.
620	   3.  There is sufficient buffering at the dominant bottleneck to
621	       absorb a slowstart rate burst large enough to get the flow out of
622	       slowstart at a suitable window size.  See Section 7.3.
623	   4.  There is sufficient buffering in the front path to absorb and
624	       smooth sender interface rate bursts at all scales that are likely
625	       to be generated by the application, any channel arbitration in
626	       the ACK path or any other mechanisms.  See Section 7.4.
627	   5.  When there is a standing queue at a bottleneck for a shared media
628	       subpath (e.g. half duplex), there are suitable bounds on how the
629	       data and ACKs interact, for example due to the channel
630	       arbitration mechanism.  See Section 7.2.4.
631	   6.  When there is a slowly rising standing queue at the bottleneck
632	       the onset of packet loss has to be at an appropriate point (time
633	       or queue depth) and progressive.  See Section 7.2.

635	   Note that conditions 1 through 4 require load tests for confirmation,
636	   and thus need to be monitored on an ongoing basis.  Conditions 5 and
637	   6 require engineering tests.  They won't generally fail due to load,
638	   but may fail in the field due to configuration errors, etc. and
639	   should be spot checked.

641	   We are developing a tool that can perform many of the tests described
642	   here[MBMSource].

644	5.  Common Models and Parameters
645	5.1.  Target End-to-end parameters

647	   The target end-to-end parameters are the target data rate, target RTT
648	   and target MTU as defined in Section 2.  These parameters are
649	   determined by the needs of the application or the ultimate end user
650	   and the end-to-end Internet path over which the application is
651	   expected to operate.  The target parameters are in units that make
652	   sense to upper layers: payload bytes delivered to the application,
653	   above TCP.  They exclude overheads associated with TCP and IP
654	   headers, retransmits and other protocols (e.g.  DNS).

656	   Other end-to-end parameters defined in Section 2 include the
657	   effective bottleneck data rate, the sender interface data rate and
658	   the TCP/IP header sizes (overhead).

660	   The target data rate must be smaller than all link data rates by
661	   enough headroom to carry the transport protocol overhead, explicitly
662	   including retransmissions and an allowance for fluctuations in the
663	   actual data rate, needed to meet the specified average rate.
664	   Specifying a target rate with insufficient headroom is likely to
665	   result in brittle measurements having little predictive value.

667	   Note that the target parameters can be specified for a hypothetical
668	   path, for example to construct TDS designed for bench testing in the
669	   absence of a real application, or for a real physical test, for in
670	   situ testing of production infrastructure.

672	   The number of concurrent connections is explicitly not a parameter to
673	   this model.  If a subpath requires multiple connections in order to
674	   meet the specified performance, that must be stated explicitly and
675	   the procedure described in Section 6.1.4 applies.

677	5.2.  Common Model Calculations

679	   The end-to-end target parameters are used to derive the
680	   target_pipe_size and the reference target_run_length.

682	   The target_pipe_size, is the average window size in packets needed to
683	   meet the target rate, for the specified target RTT and MTU.  It is
684	   given by:

686	   target_pipe_size = ceiling( target_rate * target_RTT / ( target_MTU -
687	   header_overhead ) )

689	   Target_run_length is an estimate of the minimum required number of
690	   unmarked packets that must be delivered between losses or ECN marks,
691	   as computed by a mathematical model of TCP congestion control.  The
692	   derivation here follows [MSMO97], and by design is quite
693	   conservative.  The alternate models described in Appendix A generally
694	   yield smaller run_lengths (higher acceptable loss or ECN marking
695	   rates), but may not apply in all situations.  A FSTDS that uses an
696	   alternate model MUST compare it to the reference target_run_length
697	   computed here.

699	   Reference target_run_length is derived as follows: assume the
700	   subpath_data_rate is infinitesimally larger than the target_data_rate
701	   plus the required header_overhead.  Then target_pipe_size also
702	   predicts the onset of queueing.  A larger window will cause a
703	   standing queue at the bottleneck.

705	   Assume the transport protocol is using standard Reno style Additive
706	   Increase, Multiplicative Decrease congestion control [RFC5681] (but
707	   not Appropriate Byte Counting [RFC3465]) and the receiver is using
708	   standard delayed ACKs.  Reno increases the window by one packet every
709	   pipe_size worth of ACKs.  With delayed ACKs this takes 2 Round Trip
710	   Times per increase.  To exactly fill the pipe, losses must be no
711	   closer than when the peak of the AIMD sawtooth reached exactly twice
712	   the target_pipe_size otherwise the multiplicative window reduction
713	   triggered by the loss would cause the network to be underfilled.
714	   Following [MSMO97] the number of packets between losses must be the
715	   area under the AIMD sawtooth.  They must be no more frequent than
716	   every 1 in ((3/2)*target_pipe_size)*(2*target_pipe_size) packets,
717	   which simplifies to:

719	   target_run_length = 3*(target_pipe_size^2)

721	   Note that this calculation is very conservative and is based on a
722	   number of assumptions that may not apply.  Appendix A discusses these
723	   assumptions and provides some alternative models.  If a different
724	   model is used, a fully specified TDS or FSTDS MUST document the
725	   actual method for computing target_run_length and ratio between
726	   alternate target_run_length and the reference target_run_length
727	   calculated above, along with a discussion of the rationale for the
728	   underlying assumptions.

730	   These two parameters, target_pipe_size and target_run_length,
731	   directly imply most of the individual parameters for the tests in
732	   Section 7.

734	5.3.  Parameter Derating

736	   Since some aspects of the models are very conservative, the MBM
737	   framework permits some latitude in derating test parameters.  Rather
738	   than trying to formalize more complicated models we permit some test
739	   parameters to be relaxed as long as they meet some additional
740	   procedural constraints:

742	   o  The TDS or FSTDS MUST document and justify the actual method used
743	      compute the derated metric parameters.
744	   o  The validation procedures described in Section 9 must be used to
745	      demonstrate the feasibility of meeting the performance targets
746	      with infrastructure that infinitesimally passes the derated tests.
747	   o  The validation process itself must be documented is such a way
748	      that other researchers can duplicate the validation experiments.

750	   Except as noted, all tests below assume no derating.  Tests where
751	   there is not currently a well established model for the required
752	   parameters explicitly include derating as a way to indicate
753	   flexibility in the parameters.

755	6.  Common testing procedures

757	6.1.  Traffic generating techniques

759	6.1.1.  Paced transmission

761	   Paced (burst) transmissions: send bursts of data on a timer to meet a
762	   particular target rate and pattern.  In all cases the specified data
763	   rate can either be the application or link rates.  Header overheads
764	   must be included in the calculations as appropriate.
765	   Headway:  Time interval between packets or bursts, specified from the
766	      start of one to the start of the next. e.g.  If packets are sent
767	      with a 1 mS headway, there will be exactly 1000 packets per
768	      second.
769	   Paced single packets:  Send individual packets at the specified rate
770	      or headway.
771	   Burst:  Send sender interface rate bursts on a timer.  Specify any 3
772	      of: average rate, packet size, burst size (number of packets) and
773	      burst headway (burst start to start).  These bursts are typically
774	      sent as back-to-back packets at the testers interface rate.
775	   Slowstart bursts:  Send 4 packet sender interface rate bursts at an
776	      average data rate equal to twice effective bottleneck link rate
777	      (but not more than the sender interface rate).  This corresponds
778	      to the average rate during a TCP slowstart when Appropriate Byte
779	      Counting [RFC3465] is present or delayed ack is disabled.  Note
780	      that if the effective bottleneck link rate is more than half of
781	      the sender interface rate, slowstart rate bursts become sender
782	      interface rate bursts.
783	   Repeated Slowstart bursts:  Slowstart bursts are typically part of
784	      larger scale pattern of repeated bursts, such as sending
785	      target_pipe_size packets as slowstart bursts on a target_RTT
786	      headway (burst start to burst start).  Such a stream has three
787	      different average rates, depending on the averaging interval.  At
788	      the finest time scale the average rate is the same as the sender
789	      interface rate, at a medium scale the average rate is twice the
790	      effective bottleneck link rate and at the longest time scales the
791	      average rate is equal to the target data rate.

793	   Note that in conventional measurement theory, exponential
794	   distributions are often used to eliminate many sorts of correlations.
795	   For the procedures above, the correlations are created by the network
796	   elements and accurately reflect their behavior.  At some point in the
797	   future, it will be desirable to introduce noise sources into the
798	   above pacing models, but they are not warranted at this time.

800	6.1.2.  Constant window pseudo CBR

802	   Implement pseudo constant bit rate by running a standard protocol
803	   such as TCP with a fixed window size, such that it is self clocked.
804	   Data packets arriving at the receiver trigger acknowledgements (ACKs)
805	   which travel back to the sender where they trigger additional
806	   transmissions.  The window size is computed from the target_data_rate
807	   and the actual RTT of the test path.  The rate is only maintained in
808	   average over each RTT, and is subject to limitations of the transport
809	   protocol.

811	   Since the window size is constrained to be an integer number of
812	   packets, for small RTTs or low data rates there may not be
813	   sufficiently precise control over the data rate.  Rounding the window
814	   size up (the default) is likely to be result in data rates that are
815	   higher than the target rate, but reducing the window by one packet
816	   may result in data rates that are too small.  Also cross traffic
817	   potentially raises the RTT, implicitly reducing the rate.  Cross
818	   traffic that raises the RTT nearly always makes the test more
819	   strenuous.  A FSTDS specifying a constant window CBR tests MUST
820	   explicitly indicate under what conditions errors in the data cause
821	   tests to inconclusive.  See the discussion of test outcomes in
822	   Section 6.2.1.

824	   Since constant window pseudo CBR testing is sensitive to RTT
825	   fluctuations it can not accurately control the data rate in
826	   environments with fluctuating delays.

828	6.1.3.  Scanned window pseudo CBR

830	   Scanned window pseudo CBR is similar to the constant window CBR
831	   described above, except the window is scanned across a range of sizes
832	   designed to include two key events, the onset of queueing and the
833	   onset of packet loss or ECN marks.  The window is scanned by
834	   incrementing it by one packet every 2*target_pipe_size delivered
835	   packets.  This mimics the additive increase phase of standard TCP
836	   congestion avoidance when delayed ACKs are in effect.  It normally
837	   separates the the window increases by approximately twice the
838	   target_RTT.

840	   There are two ways to implement this test: one built by applying a
841	   window clamp to standard congestion control in a standard protocol
842	   such as TCP and the other built by stiffening a non-standard
843	   transport protocol.  When standard congestion control is in effect,
844	   any losses or ECN marks cause the transport to revert to a window
845	   smaller than the clamp such that the scanning clamp loses control the
846	   window size.  The NPAD pathdiag tool is an example of this class of
847	   algorithms [Pathdiag].

849	   Alternatively a non-standard congestion control algorithm can respond
850	   to losses by transmitting extra data, such that it maintains the
851	   specified window size independent of losses or ECN marks.  Such a
852	   stiffened transport explicitly violates mandatory Internet congestion
853	   control and is not suitable for in situ testing.  [RFC5681] It is
854	   only appropriate for engineering testing under laboratory conditions.
855	   The Windowed Ping tool implements such a test [WPING].  The tool
856	   described in the paper has been updated.[mpingSource]

858	   The test procedures in Section 7.2 describe how to the partition the
859	   scans into regions and how to interpret the results.

861	6.1.4.  Concurrent or channelized testing

863	   The procedures described in this document are only directly
864	   applicable to single stream performance measurement, e.g. one TCP
865	   connection.  In an ideal world, we would disallow all performance
866	   claims based multiple concurrent streams, but this is not practical
867	   due to at least two different issues.  First, many very high rate
868	   link technologies are channelized and pin individual flows to
869	   specific channels to minimize reordering or other problems and
870	   second, TCP itself has scaling limits.  Although the former problem
871	   might be overcome through different design decisions, the later
872	   problem is more deeply rooted.

874	   All congestion control algorithms that are philosophically aligned
875	   with the standard [RFC5681] (e.g. claim some level of TCP
876	   friendliness) have scaling limits, in the sense that as a long fast
877	   network (LFN) with a fixed RTT and MTU gets faster, these congestion
878	   control algorithms get less accurate and as a consequence have
879	   difficulty filling the network[CCscaling].  These properties are a
880	   consequence of the original Reno AIMD congestion control design and
881	   the requirement in [RFC5681] that all transport protocols have
882	   uniform response to congestion.

884	   There are a number of reasons to want to specify performance in term
885	   of multiple concurrent flows, however this approach is not
886	   recommended for data rates below several megabits per second, which
887	   can be attained with run lengths under 10000 packets.  Since the
888	   required run length goes as the square of the data rate, at higher
889	   rates the run lengths can be unreasonably large, and multiple
890	   connection might be the only feasible approach.

892	   If multiple connections are deemed necessary to meet aggregate
893	   performance targets then this MUST be stated both the design of the
894	   TDS and in any claims about network performance.  The tests MUST be
895	   performed concurrently with the specified number of connections.  For
896	   the the tests that use bursty traffic, the bursts should be
897	   synchronized across flows.

899	6.2.  Interpreting the Results

901	6.2.1.  Test outcomes

903	   To perform an exhaustive test of an end-to-end network path, each
904	   test of the TDS is applied to each subpath of an end-to-end path.  If
905	   any subpath fails any test then an application running over the end-
906	   to-end path can also be expected to fail to attain the target
907	   performance under some conditions.

909	   In addition to passing or failing, a test can be deemed to be
910	   inconclusive for a number of reasons.  Proper instrumentation and
911	   treatment of inconclusive outcomes is critical to the accuracy and
912	   robustness of Model Based Metrics.  Tests can be inconclusive if the
913	   precomputed traffic pattern or data rates were not accurately
914	   generated; the measurement results were not statistically
915	   significant; and others causes such as failing to meet some required
916	   preconditions for the test.

918	   For example consider a test that implements Constant Window Pseudo
919	   CBR (Section 6.1.2) by adding rate controls and detailed traffic
920	   instrumentation to TCP (e.g.  [RFC4898]).  TCP includes built in
921	   control systems which might interfere with the sending data rate.  If
922	   such a test meets the required delivery statistics (e.g. run length)
923	   while failing to attain the specified data rate it must be treated as
924	   an inconclusive result, because we can not a priori determine if the
925	   reduced data rate was caused by a TCP problem or a network problem,
926	   or if the reduced data rate had a material effect on the observed
927	   delivery statistics.

929	   Note that for load tests, if the observed delivery statistics fail to
930	   meet the targets, the test can can be considered to have failed
931	   because it doesn't really matter that the test didn't attain the
932	   required data rate.

934	   The really important new properties of MBM, such as vantage
935	   independence, are a direct consequence of opening the control loops
936	   in the protocols, such that the test traffic does not depend on
937	   network conditions or traffic received.  Any mechanism that
938	   introduces feedback between the paths measurements and the traffic
939	   generation is at risk of introducing nonlinearities that spoil these
940	   properties.  Any exceptional event that indicates that such feedback
941	   has happened should cause the test to be considered inconclusive.

943	   One way to view inconclusive tests is that they reflect situations
944	   where a test outcome is ambiguous between limitations of the network
945	   and some unknown limitation of the diagnostic test itself, which may
946	   have been caused by some uncontrolled feedback from the network.

948	   Note that procedures that attempt to sweep the target parameter space
949	   to find the limits on some parameter such as target_data_rate are at
950	   risk of breaking the location independent properties of Model Based
951	   Metrics, if the boundary between passing and inconclusive is at all
952	   sensitive to RTT.

954	   One of the goals for evolving TDS designs will be to keep sharpening
955	   distinction between inconclusive, passing and failing tests.  The
956	   criteria for for passing, failing and inconclusive tests MUST be
957	   explicitly stated for every test in the TDS or FSTDS.

959	   One of the goals of evolving the testing process, procedures, tools
960	   and measurement point selection should be to minimize the number of
961	   inconclusive tests.

963	   It may be useful to keep raw data delivery statistics for deeper
964	   study of the behavior of the network path and to measure the tools
965	   themselves.  Raw delivery statistics can help to drive tool
966	   evolution.  Under some conditions it might be possible to reevaluate
967	   the raw data for satisfying alternate performance targets.  However
968	   it is important to guard against sampling bias and other implicit
969	   feedback which can cause false results and exhibit measurement point
970	   vantage sensitivity.

972	6.2.2.  Statistical criteria for estimating run_length

974	   When evaluating the observed run_length, we need to determine
975	   appropriate packet stream sizes and acceptable error levels for
976	   efficient measurement.  In practice, can we compare the empirically
977	   estimated packet loss and ECN marking probabilities with the targets
978	   as the sample size grows?  How large a sample is needed to say that
979	   the measurements of packet transfer indicate a particular run length
980	   is present?
981	   The generalized measurement can be described as recursive testing:
982	   send packets (individually or in patterns) and observe the packet
983	   delivery performance (loss ratio or other metric, any marking we
984	   define).

986	   As each packet is sent and measured, we have an ongoing estimate of
987	   the performance in terms of the ratio of packet loss or ECN mark to
988	   total packets (i.e. an empirical probability).  We continue to send
989	   until conditions support a conclusion or a maximum sending limit has
990	   been reached.

992	   We have a target_mark_probability, 1 mark per target_run_length,
993	   where a "mark" is defined as a lost packet, a packet with ECN mark,
994	   or other signal.  This constitutes the null Hypothesis:

996	   H0:  no more than one mark in target_run_length =
997	      3*(target_pipe_size)^2 packets

999	   and we can stop sending packets if on-going measurements support
1000	   accepting H0 with the specified Type I error = alpha (= 0.05 for
1001	   example).

1003	   We also have an alternative Hypothesis to evaluate: if performance is
1004	   significantly lower than the target_mark_probability.  Based on
1005	   analysis of typical values and practical limits on measurement
1006	   duration, we choose four times the H0 probability:

1008	   H1:  one or more marks in (target_run_length/4) packets

1010	   and we can stop sending packets if measurements support rejecting H0
1011	   with the specified Type II error = beta (= 0.05 for example), thus
1012	   preferring the alternate hypothesis H1.

1014	   H0 and H1 constitute the Success and Failure outcomes described
1015	   elsewhere in the memo, and while the ongoing measurements do not
1016	   support either hypothesis the current status of measurements is
1017	   inconclusive.

1019	   The problem above is formulated to match the Sequential Probability
1020	   Ratio Test (SPRT) [StatQC].  Note that as originally framed the
1021	   events under consideration were all manufacturing defects.  In
1022	   networking, ECN marks and lost packets are not defects but signals,
1023	   indicating that the transport protocol should slow down.

1025	   The Sequential Probability Ratio Test also starts with a pair of
1026	   hypothesis specified as above:

1028	   H0:  p0 = one defect in target_run_length
1029	   H1:  p1 = one defect in target_run_length/4
1030	   As packets are sent and measurements collected, the tester evaluates
1031	   the cumulative defect count against two boundaries representing H0
1032	   Acceptance or Rejection (and acceptance of H1):

1034	   Acceptance line:  Xa = -h1 + s*n
1035	   Rejection line:  Xr = h2 + s*n
1036	   where n increases linearly for each packet sent and

1038	   h1 =  { log((1-alpha)/beta) }/k
1039	   h2 =  { log((1-beta)/alpha) }/k
1040	   k  =  log{ (p1(1-p0)) / (p0(1-p1)) }
1041	   s  =  [ log{ (1-p0)/(1-p1) } ]/k
1042	   for p0 and p1 as defined in the null and alternative Hypotheses
1043	   statements above, and alpha and beta as the Type I and Type II
1044	   errors.

1046	   The SPRT specifies simple stopping rules:

1048	   o  Xa < defect_count(n) < Xb: continue testing
1049	   o  defect_count(n) <= Xa: Accept H0
1050	   o  defect_count(n) >= Xb: Accept H1

1052	   The calculations above are implemented in the R-tool for Statistical
1053	   Analysis [Rtool] , in the add-on package for Cross-Validation via
1054	   Sequential Testing (CVST) [CVST] .

1056	   Using the equations above, we can calculate the minimum number of
1057	   packets (n) needed to accept H0 when x defects are observed.  For
1058	   example, when x = 0:

1060	   Xa = 0  = -h1 + s*n
1061	   and  n = h1 / s

1063	6.2.3.  Reordering Tolerance

1065	   All tests must be instrumented for packet level reordering [RFC4737].
1066	   However, there is no consensus for how much reordering should be
1067	   acceptable.  Over the last two decades the general trend has been to
1068	   make protocols and applications more tolerant to reordering (see for
1069	   example [RFC4015]), in response to the gradual increase in reordering
1070	   in the network.  This increase has been due to the deployment of
1071	   technologies such as multi threaded routing lookups and Equal Cost
1072	   MultiPath (ECMP) routing.  These techniques increase parallelism in
1073	   network and are critical to enabling overall Internet growth to
1074	   exceed Moore's Law.

1076	   Note that transport retransmission strategies can trade off
1077	   reordering tolerance vs how quickly they can repair losses vs
1078	   overhead from spurious retransmissions.  In advance of new
1079	   retransmission strategies we propose the following strawman:
1080	   Transport protocols should be able to adapt to reordering as long as
1081	   the reordering extent is no more than the maximum of one quarter
1082	   window or 1 mS, whichever is larger.  Within this limit on reorder
1083	   extent, there should be no bound on reordering density.

1085	   By implication, recording which is less than these bounds should not
1086	   be treated as a network impairment.  However [RFC4737] still applies:
1087	   reordering should be instrumented and the maximum reordering that can
1088	   be properly characterized by the test (e.g. bound on history buffers)
1089	   should be recorded with the measurement results.

1091	   Reordering tolerance and diagnostic limitations, such as history
1092	   buffer size, MUST be specified in a FSTDS.

1094	6.3.  Test Preconditions

1096	   Many tests have preconditions which are required to assure their
1097	   validity.  For example the presence or nonpresence of cross traffic
1098	   on specific subpaths, or appropriate preloading to put reactive
1099	   network elements into the proper states[RFC7312]).  If preconditions
1100	   are not properly satisfied for some reason, the tests should be
1101	   considered to be inconclusive.  In general it is useful to preserve
1102	   diagnostic information about why the preconditions were not met, and
1103	   any test data that was collected even if it is not useful for the
1104	   intended test.  Such diagnostic information and partial test data may
1105	   be useful for improving the test in the future.

1107	   It is important to preserve the record that a test was scheduled,
1108	   because otherwise precondition enforcement mechanisms can introduce
1109	   sampling bias.  For example, canceling tests due to cross traffic on
1110	   subscriber access links might introduce sampling bias of tests of the
1111	   rest of the network by reducing the number of tests during peak
1112	   network load.

1114	   Test preconditions and failure actions MUST be specified in a FSTDS.

1116	7.  Diagnostic Tests

1118	   The diagnostic tests below are organized by traffic pattern: basic
1119	   data rate and delivery statistics, standing queues, slowstart bursts,
1120	   and sender rate bursts.  We also introduce some combined tests which
1121	   are more efficient when networks are expected to pass, but conflate
1122	   diagnostic signatures when they fail.

1124	   There are a number of test details which are not fully defined here.
1125	   They must be fully specified in a FSTDS.  From a standardization
1126	   perspective, this lack of specificity will weaken this version of
1127	   Model Based Metrics, however it is anticipated that this it be more
1128	   than offset by the extent to which MBM suppresses the problems caused
1129	   by using transport protocols for measurement. e.g. non-specific MBM
1130	   metrics are likely to have better repeatability than many existing
1131	   BTC like metrics.  Once we have good field experience, the missing
1132	   details can be fully specified.

1134	7.1.  Basic Data Rate and Delivery Statistics Tests

1136	   We propose several versions of the basic data rate and delivery
1137	   statistics test.  All measure the number of packets delivered between
1138	   losses or ECN marks, using a data stream that is rate controlled at
1139	   or below the target_data_rate.

1141	   The tests below differ in how the data rate is controlled.  The data
1142	   can be paced on a timer, or window controlled at full target data
1143	   rate.  The first two tests implicitly confirm that sub_path has
1144	   sufficient raw capacity to carry the target_data_rate.  They are
1145	   recommend for relatively infrequent testing, such as an installation
1146	   or periodic auditing process.  The third, background delivery
1147	   statistics, is a low rate test designed for ongoing monitoring for
1148	   changes in subpath quality.

1150	   All rely on the receiver accumulating packet delivery statistics as
1151	   described in Section 6.2.2 to score the outcome:

1153	   Pass: it is statistically significant that the observed interval
1154	   between losses or ECN marks is larger than the target_run_length.

1156	   Fail: it is statistically significant that the observed interval
1157	   between losses or ECN marks is smaller than the target_run_length.

1159	   A test is considered to be inconclusive if it failed to meet the data
1160	   rate as specified below, meet the qualifications defined in
1161	   Section 6.3 or neither run length statistical hypothesis was
1162	   confirmed in the allotted test duration.

1164	7.1.1.  Delivery Statistics at Paced Full Data Rate

1166	   Confirm that the observed run length is at least the
1167	   target_run_length while relying on timer to send data at the
1168	   target_rate using the procedure described in in Section 6.1.1 with a
1169	   burst size of 1 (single packets) or 2 (packet pairs).

1171	   The test is considered to be inconclusive if the packet transmission
1172	   can not be accurately controlled for any reason.

1174	   RFC 6673 [RFC6673] is appropriate for measuring delivery statistics
1175	   at full data rate.

1177	7.1.2.  Delivery Statistics at Full Data Windowed Rate

1179	   Confirm that the observed run length is at least the
1180	   target_run_length while sending at an average rate approximately
1181	   equal to the target_data_rate, by controlling (or clamping) the
1182	   window size of a conventional transport protocol to a fixed value
1183	   computed from the properties of the test path, typically
1184	   test_window=target_data_rate*test_RTT/target_MTU.  Note that if there
1185	   is any interaction between the forward and return path, test_window
1186	   may need to be adjusted slightly to compensate for the resulting
1187	   inflated RTT.

1189	   Since losses and ECN marks generally cause transport protocols to at
1190	   least temporarily reduce their data rates, this test is expected to
1191	   be less precise about controlling its data rate.  It should not be
1192	   considered inconclusive as long as at least some of the round trips
1193	   reached the full target_data_rate without incurring losses or ECN
1194	   marks.  To pass this test the network MUST deliver target_pipe_size
1195	   packets in target_RTT time without any losses or ECN marks at least
1196	   once per two target_pipe_size round trips, in addition to meeting the
1197	   run length statistical test.

1199	7.1.3.  Background Delivery Statistics Tests

1201	   The background run length is a low rate version of the target target
1202	   rate test above, designed for ongoing lightweight monitoring for
1203	   changes in the observed subpath run length without disrupting users.
1204	   It should be used in conjunction with one of the above full rate
1205	   tests because it does not confirm that the subpath can support raw
1206	   data rate.

1208	   RFC 6673 [RFC6673] is appropriate for measuring background delivery
1209	   statistics.

1211	7.2.  Standing Queue Tests

1213	   These engineering tests confirm that the bottleneck is well behaved
1214	   across the onset of packet loss, which typically follows after the
1215	   onset of queueing.  Well behaved generally means lossless for
1216	   transient queues, but once the queue has been sustained for a
1217	   sufficient period of time (or reaches a sufficient queue depth) there
1218	   should be a small number of losses to signal to the transport
1219	   protocol that it should reduce its window.  Losses that are too early
1220	   can prevent the transport from averaging at the target_data_rate.
1221	   Losses that are too late indicate that the queue might be subject to
1222	   bufferbloat [wikiBloat] and inflict excess queuing delays on all
1223	   flows sharing the bottleneck queue.  Excess losses (more than half of
1224	   the window) at the onset of congestion make loss recovery problematic
1225	   for the transport protocol.  Non-linear, erratic or excessive RTT
1226	   increases suggest poor interactions between the channel acquisition
1227	   algorithms and the transport self clock.  All of the tests in this
1228	   section use the same basic scanning algorithm, described here, but
1229	   score the link on the basis of how well it avoids each of these
1230	   problems.

1232	   For some technologies the data might not be subject to increasing
1233	   delays, in which case the data rate will vary with the window size
1234	   all the way up to the onset of load induced losses or ECN marks.  For
1235	   theses technologies, the discussion of queueing does not apply, but
1236	   it is still required that the onset of losses or ECN marks be at an
1237	   appropriate point and progressive.

1239	   Use the procedure in Section 6.1.3 to sweep the window across the
1240	   onset of queueing and the onset of loss.  The tests below all assume
1241	   that the scan emulates standard additive increase and delayed ACK by
1242	   incrementing the window by one packet for every 2*target_pipe_size
1243	   packets delivered.  A scan can typically be divided into three
1244	   regions: below the onset of queueing, a standing queue, and at or
1245	   beyond the onset of loss.

1247	   Below the onset of queueing the RTT is typically fairly constant, and
1248	   the data rate varies in proportion to the window size.  Once the data
1249	   rate reaches the link rate, the data rate becomes fairly constant,
1250	   and the RTT increases in proportion to the increase in window size.
1251	   The precise transition across the start of queueing can be identified
1252	   by the maximum network power, defined to be the ratio data rate over
1253	   the RTT.  The network power can be computed at each window size, and
1254	   the window with the maximum are taken as the start of the queueing
1255	   region.

1257	   For technologies that do not have conventional queues, start the scan
1258	   at a window equal to the test_window=target_data_rate*test_RTT/
1259	   target_MTU, i.e. starting at the target rate, instead of the power
1260	   point.

1262	   If there is random background loss (e.g. bit errors, etc), precise
1263	   determination of the onset of queue induced packet loss may require
1264	   multiple scans.  Above the onset of queuing loss, all transport
1265	   protocols are expected to experience periodic losses determined by
1266	   the interaction between the congestion control and AQM algorithms.
1267	   For standard congestion control algorithms the periodic losses are
1268	   likely to be relatively widely spaced and the details are typically
1269	   dominated by the behavior of the transport protocol itself.  For the
1270	   stiffened transport protocols case (with non-standard, aggressive
1271	   congestion control algorithms) the details of periodic losses will be
1272	   dominated by how the the window increase function responds to loss.

1274	7.2.1.  Congestion Avoidance

1276	   A link passes the congestion avoidance standing queue test if more
1277	   than target_run_length packets are delivered between the onset of
1278	   queueing (as determined by the window with the maximum network power)
1279	   and the first loss or ECN mark.  If this test is implemented using a
1280	   standards congestion control algorithm with a clamp, it can be
1281	   performed in situ in the production internet as a capacity test.  For
1282	   an example of such a test see [Pathdiag].

1284	   For technologies that do not have conventional queues, use the
1285	   test_window inplace of the onset of queueing. i.e.  A link passes the
1286	   congestion avoidance standing queue test if more than
1287	   target_run_length packets are delivered between start of the scan at
1288	   test_window and the first loss or ECN mark.

1290	7.2.2.  Bufferbloat

1292	   This test confirms that there is some mechanism to limit buffer
1293	   occupancy (e.g. that prevents bufferbloat).  Note that this is not
1294	   strictly a requirement for single stream bulk performance, however if
1295	   there is no mechanism to limit buffer queue occupancy then a single
1296	   stream with sufficient data to deliver is likely to cause the
1297	   problems described in [RFC2309], [I-D.ietf-aqm-recommendation] and
1298	   [wikiBloat].  This may cause only minor symptoms for the dominant
1299	   flow, but has the potential to make the link unusable for other flows
1300	   and applications.

1302	   Pass if the onset of loss occurs before a standing queue has
1303	   introduced more delay than than twice target_RTT, or other well
1304	   defined and specified limit.  Note that there is not yet a model for
1305	   how much standing queue is acceptable.  The factor of two chosen here
1306	   reflects a rule of thumb.  In conjunction with the previous test,
1307	   this test implies that the first loss should occur at a queueing
1308	   delay which is between one and two times the target_RTT.

1310	   Specified RTT limits that are larger than twice the target_RTT must
1311	   be fully justified in the FSTDS.

1313	7.2.3.  Non excessive loss

1315	   This test confirm that the onset of loss is not excessive.  Pass if
1316	   losses are equal or less than the increase in the cross traffic plus
1317	   the test traffic window increase on the previous RTT.  This could be
1318	   restated as non-decreasing link throughput at the onset of loss,
1319	   which is easy to meet as long as discarding packets in not more
1320	   expensive than delivering them.  (Note when there is a transient drop
1321	   in link throughput, outside of a standing queue test, a link that
1322	   passes other queue tests in this document will have sufficient queue
1323	   space to hold one RTT worth of data).

1325	   Note that conventional Internet traffic policers will not pass this
1326	   test, which is correct.  TCP often fails to come into equilibrium at
1327	   more than a small fraction of the available capacity, if the capacity
1328	   is enforced by a policer.  [Citation Pending].

1330	7.2.4.  Duplex Self Interference

1332	   This engineering test confirms a bound on the interactions between
1333	   the forward data path and the ACK return path.

1335	   Some historical half duplex technologies had the property that each
1336	   direction held the channel until it completely drains its queue.
1337	   When a self clocked transport protocol, such as TCP, has data and
1338	   acks passing in opposite directions through such a link, the behavior
1339	   often reverts to stop-and-wait.  Each additional packet added to the
1340	   window raises the observed RTT by two forward path packet times, once
1341	   as it passes through the data path, and once for the additional delay
1342	   incurred by the ACK waiting on the return path.

1344	   The duplex self interference test fails if the RTT rises by more than
1345	   some fixed bound above the expected queueing time computed from trom
1346	   the excess window divided by the link data rate.  This bound must be
1347	   smaller than target_RTT/2 to avoid reverting to stop and wait
1348	   behavior. (e.g.  Packets have to be released at least twice per RTT,
1349	   to avoid stop and wait behavior.)

1351	7.3.  Slowstart tests

1353	   These tests mimic slowstart: data is sent at twice the effective
1354	   bottleneck rate to exercise the queue at the dominant bottleneck.

1356	   In general they are deemed inconclusive if the elapsed time to send
1357	   the data burst is not less than half of the time to receive the ACKs.
1358	   (i.e. sending data too fast is ok, but sending it slower than twice
1359	   the actual bottleneck rate as indicated by the ACKs is deemed
1360	   inconclusive).  Space the bursts such that the average data rate is
1361	   equal to the target_data_rate.

1363	7.3.1.  Full Window slowstart test

1365	   This is a capacity test to confirm that slowstart is not likely to
1366	   exit prematurely.  Send slowstart bursts that are target_pipe_size
1367	   total packets.

1369	   Accumulate packet delivery statistics as described in Section 6.2.2
1370	   to score the outcome.  Pass if it is statistically significant that
1371	   the observed number of good packets delivered between losses or ECN
1372	   marks is larger than the target_run_length.  Fail if it is
1373	   statistically significant that the observed interval between losses
1374	   or ECN marks is smaller than the target_run_length.

1376	   Note that these are the same parameters as the Sender Full Window
1377	   burst test, except the burst rate is at slowestart rate, rather than
1378	   sender interface rate.

1380	7.3.2.  Slowstart AQM test

1382	   Do a continuous slowstart (send data continuously at slowstart_rate),
1383	   until the first loss, stop, allow the network to drain and repeat,
1384	   gathering statistics on the last packet delivered before the loss,
1385	   the loss pattern, maximum observed RTT and window size.  Justify the
1386	   results.  There is not currently sufficient theory justifying
1387	   requiring any particular result, however design decisions that affect
1388	   the outcome of this tests also affect how the network balances
1389	   between long and short flows (the "mice and elephants" problem).  The
1390	   queue at the time of the first loss should be at least one half of
1391	   the target_RTT.

1393	   This is an engineering test: It would be best performed on a
1394	   quiescent network or testbed, since cross traffic has the potential
1395	   to change the results.

1397	7.4.  Sender Rate Burst tests

1399	   These tests determine how well the network can deliver bursts sent at
1400	   sender's interface rate.  Note that this test most heavily exercises
1401	   the front path, and is likely to include infrastructure may be out of
1402	   scope for an access ISP, even though the bursts might be caused by
1403	   ACK compression, thinning or channel arbitration in the access ISP.
1404	   See Appendix B.

1406	   Also, there are a several details that are not precisely defined.
1407	   For starters there is not a standard server interface rate. 1 Gb/s
1408	   and 10 Gb/s are very common today, but higher rates will become cost
1409	   effective and can be expected to be dominant some time in the future.

1411	   Current standards permit TCP to send a full window bursts following
1412	   an application pause.  (Congestion Window Validation [RFC2861], is
1413	   not required, but even if was, it does not take effect until an
1414	   application pause is longer than an RTO.)  Since full window bursts
1415	   are consistent with standard behavior, it is desirable that the
1416	   network be able to deliver such bursts, otherwise application pauses
1417	   will cause unwarranted losses.  Note that the AIMD sawtooth requires
1418	   a peak window that is twice target_pipe_size, so the worst case burst
1419	   may be 2*target_pipe_size.

1421	   It is also understood in the application and serving community that
1422	   interface rate bursts have a cost to the network that has to be
1423	   balanced against other costs in the servers themselves.  For example
1424	   TCP Segmentation Offload (TSO) reduces server CPU in exchange for
1425	   larger network bursts, which increase the stress on network buffer
1426	   memory.

1428	   There is not yet theory to unify these costs or to provide a
1429	   framework for trying to optimize global efficiency.  We do not yet
1430	   have a model for how much the network should tolerate server rate
1431	   bursts.  Some bursts must be tolerated by the network, but it is
1432	   probably unreasonable to expect the network to be able to efficiently
1433	   deliver all data as a series of bursts.

1435	   For this reason, this is the only test for which we encourage
1436	   derating.  A TDS could include a table of pairs of derating
1437	   parameters: what burst size to use as a fraction of the
1438	   target_pipe_size, and how much each burst size is permitted to reduce
1439	   the run length, relative to to the target_run_length.

1441	7.5.  Combined and Implicit Tests

1443	   Combined tests efficiently confirm multiple network properties in a
1444	   single test, possibly as a side effect of normally content delivery.
1445	   They require less measurement traffic than other testing strategies
1446	   at the cost of conflating diagnostic signatures when they fail.
1447	   These are by far the most efficient for monitoring networks that are
1448	   nominally expected to pass all tests.

1450	7.5.1.  Sustained Bursts Test

1452	   The sustained burst test implements a combined worst case version of
1453	   all of the load tests above.  It is simply:

1455	   Send target_pipe_size bursts of packets at server interface rate with
1456	   target_RTT headway (burst start to burst start).  Verify that the
1457	   observed delivery statistics meets the target_run_length.

1459	   Key observations:
1460	   o  The subpath under test is expected to go idle for some fraction of
1461	      the time: (subpath_data_rate-target_rate)/subpath_data_rate.
1462	      Failing to do so indicates a problem with the procedure and an
1463	      inconclusive test result.
1464	   o  The burst sensitivity can be derated by sending smaller bursts
1465	      more frequently.  E.g. send target_pipe_size*derate packet bursts
1466	      every target_RTT*derate.
1467	   o  When not derated, this test is the most strenuous load test.
1468	   o  A link that passes this test is likely to be able to sustain
1469	      higher rates (close to subpath_data_rate) for paths with RTTs
1470	      significantly smaller than the target_RTT.
1471	   o  This test can be implemented with instrumented TCP [RFC4898],
1472	      using a specialized measurement application at one end [MBMSource]
1473	      and a minimal service at the other end [RFC0863] [RFC0864].
1474	   o  This test is efficient to implement, since it does not require
1475	      per-packet timers, and can make use of TSO in modern NIC hardware.
1476	   o  This test by itself is not sufficient: the standing window
1477	      engineering tests are also needed to ensure that the link is well
1478	      behaved at and beyond the onset of congestion.
1479	   o  Assuming the link passes relevant standing window engineering
1480	      tests (particularly that it has a progressive onset of loss at an
1481	      appropriate queue depth) the passing sustained burst test is
1482	      (believed to be) a sufficient verify that the subpath will not
1483	      impair stream at the target performance under all conditions.
1484	      Proving this statement will be subject of ongoing research.

1486	   Note that this test is clearly independent of the subpath RTT, or
1487	   other details of the measurement infrastructure, as long as the
1488	   measurement infrastructure can accurately and reliably deliver the
1489	   required bursts to the subpath under test.

1491	7.5.2.  Streaming Media

1493	   Model Based Metrics can be implicitly implemented as a side effect of
1494	   serving any non-throughput maximizing traffic, such as streaming
1495	   media, with some additional controls and instrumentation in the
1496	   servers.  The essential requirement is that the traffic be
1497	   constrained such that even with arbitrary application pauses, bursts
1498	   and data rate fluctuations, the traffic stays within the envelope
1499	   defined by the individual tests described above.

1501	   If the application's serving_data_rate is less than or equal to the
1502	   target_data_rate and the serving_RTT (the RTT between the sender and
1503	   client) is less than the target_RTT, this constraint is most easily
1504	   implemented by clamping the transport window size to be no larger
1505	   than:

1507	   serving_window_clamp=target_data_rate*serving_RTT/
1508	   (target_MTU-header_overhead)

1510	   Under the above constraints the serving_window_clamp will limit the
1511	   both the serving data rate and burst sizes to be no larger than the
1512	   procedures in Section 7.1.2 and Section 7.4 or Section 7.5.1.  Since
1513	   the serving RTT is smaller than the target_RTT, the worst case bursts
1514	   that might be generated under these conditions will be smaller than
1515	   called for by Section 7.4 and the sender rate burst sizes are
1516	   implicitly derated by the serving_window_clamp divided by the
1517	   target_pipe_size at the very least.  (Depending on the application
1518	   behavior, the data traffic might be significantly smoother than
1519	   specified by any of the burst tests.)

1521	   Note that it is important that the target_data_rate be above the
1522	   actual average rate needed by the application so it can recover after
1523	   transient pauses caused by congestion or the application itself.

1525	   In an alternative implementation the data rate and bursts might be
1526	   explicitly controlled by a host shaper or pacing at the sender.  This
1527	   would provide better control over transmissions but it is
1528	   substantially more complicated to implement and would be likely to
1529	   have a higher CPU overhead.

1531	   Note that these techniques can be applied to any content delivery
1532	   that can be subjected to a reduced data rate in order to inhibit TCP
1533	   equilibrium behavior.

1535	8.  An Example

1537	   In this section a we illustrate a TDS designed to confirm that an
1538	   access ISP can reliably deliver HD video from multiple content
1539	   providers to all of their customers.  With modern codecs, minimal HD
1540	   video (720p) generally fits in 2.5 Mb/s.  Due to their geographical
1541	   size, network topology and modem designs the ISP determines that most
1542	   content is within a 50 mS RTT from their users (This is a sufficient
1543	   to cover continental Europe or either US coast from a single serving
1544	   site.)
1545	                        2.5 Mb/s over a 50 ms path

1547	                +----------------------+-------+---------+
1548	                | End to End Parameter | value | units   |
1549	                +----------------------+-------+---------+
1550	                | target_rate          | 2.5   | Mb/s    |
1551	                | target_RTT           | 50    | ms      |
1552	                | target_MTU           | 1500  | bytes   |
1553	                | header_overhead      | 64    | bytes   |
1554	                | target_pipe_size     | 11    | packets |
1555	                | target_run_length    | 363   | packets |
1556	                +----------------------+-------+---------+

1558	                                  Table 1

1560	   Table 1 shows the default TCP model with no derating, and as such is
1561	   quite conservative.  The simplest TDS would be to use the sustained
1562	   burst test, described in Section 7.5.1.  Such a test would send 11
1563	   packet bursts every 50mS, and confirming that there was no more than
1564	   1 packet loss per 33 bursts (363 total packets in 1.650 seconds).

1566	   Since this number represents is the entire end-to-ends loss budget,
1567	   independent subpath tests could be implemented by apportioning the
1568	   loss rate across subpaths.  For example 50% of the losses might be
1569	   allocated to the access or last mile link to the user, 40% to the
1570	   interconnects with other ISPs and 1% to each internal hop (assuming
1571	   no more than 10 internal hops).  Then all of the subpaths can be
1572	   tested independently, and the spatial composition of passing subpaths
1573	   would be expected to be within the end-to-end loss budget.

1575	   Testing interconnects has generally been problematic: conventional
1576	   performance tests run between Measurement Points adjacent to either
1577	   side of the interconnect, are not generally useful.  Unconstrained
1578	   TCP tests, such as iperf [iperf] are usually overly aggressive
1579	   because the RTT is so small (often less than 1 mS).  With a short RTT
1580	   these tools are likely to report inflated numbers because for short
1581	   RTTs these tools can tolerate very hight loss rates and can push
1582	   other cross traffic off of the network.  As a consequence they are
1583	   useless for predicting actual user performance, and may themselves be
1584	   quite disruptive.  Model Based Metrics solves this problem.  The same
1585	   test pattern as used on other links can be applied to the
1586	   interconnect.  For our example, when apportioned 40% of the losses,
1587	   11 packet bursts sent every 50mS should have fewer than one loss per
1588	   82 bursts (902 packets).

1590	9.  Validation

1592	   Since some aspects of the models are likely to be too conservative,
1593	   Section 5.2 permits alternate protocol models and Section 5.3 permits
1594	   test parameter derating.  If either of these techniques are used, we
1595	   require demonstrations that such a TDS can robustly detect links that
1596	   will prevent authentic applications using state-of-the-art protocol
1597	   implementations from meeting the specified performance targets.  This
1598	   correctness criteria is potentially difficult to prove, because it
1599	   implicitly requires validating a TDS against all possible links and
1600	   subpaths.  The procedures described here are still experimental.

1602	   We suggest two approaches, both of which should be applied: first,
1603	   publish a fully open description of the TDS, including what
1604	   assumptions were used and and how it was derived, such that the
1605	   research community can evaluate the design decisions, test them and
1606	   comment on their applicability; and second, demonstrate that an
1607	   applications running over an infinitessimally passing testbed do meet
1608	   the performance targets.

1610	   An infinitessimally passing testbed resembles a epsilon-delta proof
1611	   in calculus.  Construct a test network such that all of the
1612	   individual tests of the TDS pass by only small (infinitesimal)
1613	   margins, and demonstrate that a variety of authentic applications
1614	   running over real TCP implementations (or other protocol as
1615	   appropriate) meets the end-to-end target parameters over such a
1616	   network.  The workloads should include multiple types of streaming
1617	   media and transaction oriented short flows (e.g. synthetic web
1618	   traffic ).

1620	   For example, for the HD streaming video TDS described in Section 8,
1621	   the link layer bottleneck data rate should be exactly the header
1622	   overhead above 2.5 Mb/s, the per packet random background loss
1623	   probability should be 1/363, for a run length of 363 packets, the
1624	   bottleneck queue should be 11 packets and the front path should have
1625	   just enough buffering to withstand 11 packet interface rate bursts.
1626	   We want every one of the TDS tests to fail if we slightly increase
1627	   the relevant test parameter, so for example sending a 12 packet
1628	   bursts should cause excess (possibly deterministic) packet drops at
1629	   the dominant queue at the bottleneck.  On this infinitessimally
1630	   passing network it should be possible for a real application using a
1631	   stock TCP implementation in the vendor's default configuration to
1632	   attain 2.5 Mb/s over an 50 mS path.

1634	   The most difficult part of setting up such a testbed is arranging for
1635	   it to infinitesimally pass the individual tests.  Two approaches:
1636	   constraining the network devices not to use all available resources
1637	   (e.g. by limiting available buffer space or data rate); and
1638	   preloading subpaths with cross traffic.  Note that is it important
1639	   that a single environment be constructed which infinitessimally
1640	   passes all tests at the same time, otherwise there is a chance that
1641	   TCP can exploit extra latitude in some parameters (such as data rate)
1642	   to partially compensate for constraints in other parameters (queue
1643	   space, or viceversa).

1645	   To the extent that a TDS is used to inform public dialog it should be
1646	   fully publicly documented, including the details of the tests, what
1647	   assumptions were used and how it was derived.  All of the details of
1648	   the validation experiment should also be published with sufficient
1649	   detail for the experiments to be replicated by other researchers.
1650	   All components should either be open source of fully described
1651	   proprietary implementations that are available to the research
1652	   community.

1654	10.  Security Considerations

1656	   Measurement is often used to inform business and policy decisions,
1657	   and as a consequence is potentially subject to manipulation for
1658	   illicit gains.  Model Based Metrics are expected to be a huge step
1659	   forward because equivalent measurements can be performed from
1660	   multiple vantage points, such that performance claims can be
1661	   independently validated by multiple parties.

1663	   Much of the acrimony in the Net Neutrality debate is due by the
1664	   historical lack of any effective vantage independent tools to
1665	   characterize network performance.  Traditional methods for measuring
1666	   bulk transport capacity are sensitive to RTT and as a consequence
1667	   often yield very different results local to an ISP and end-to-end.
1668	   Neither the ISP nor customer can repeat the other's measurements
1669	   leading to high levels of distrust and acrimony.  Model Based Metrics
1670	   are expected to greatly improve this situation.

1672	   This document only describes a framework for designing Fully
1673	   Specified Targeted Diagnostic Suite.  Each FSTDS MUST include its own
1674	   security section.

1676	11.  Acknowledgements

1678	   Ganga Maguluri suggested the statistical test for measuring loss
1679	   probability in the target run length.  Alex Gilgur for helping with
1680	   the statistics.

1682	   Meredith Whittaker for improving the clarity of the communications.

1684	   This work was inspired by Measurement Lab: open tools running on an
1685	   open platform, using open tools to collect open data.  See
1686	   http://www.measurementlab.net/

1688	12.  IANA Considerations

1690	   This document has no actions for IANA.

1692	13.  References

1694	13.1.  Normative References

1696	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1697	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1699	13.2.  Informative References

1701	   [RFC0863]  Postel, J., "Discard Protocol", STD 21, RFC 863, May 1983.

1703	   [RFC0864]  Postel, J., "Character Generator Protocol", STD 22,
1704	              RFC 864, May 1983.

1706	   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
1707	              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
1708	              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
1709	              S., Wroclawski, J., and L. Zhang, "Recommendations on
1710	              Queue Management and Congestion Avoidance in the
1711	              Internet", RFC 2309, April 1998.

1713	   [RFC2330]  Paxson, V., Almes, G., Mahdavi, J., and M. Mathis,
1714	              "Framework for IP Performance Metrics", RFC 2330,
1715	              May 1998.

1717	   [RFC2861]  Handley, M., Padhye, J., and S. Floyd, "TCP Congestion
1718	              Window Validation", RFC 2861, June 2000.

1720	   [RFC3148]  Mathis, M. and M. Allman, "A Framework for Defining
1721	              Empirical Bulk Transfer Capacity Metrics", RFC 3148,
1722	              July 2001.

1724	   [RFC3465]  Allman, M., "TCP Congestion Control with Appropriate Byte
1725	              Counting (ABC)", RFC 3465, February 2003.

1727	   [RFC4015]  Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm
1728	              for TCP", RFC 4015, February 2005.

1730	   [RFC4737]  Morton, A., Ciavattone, L., Ramachandran, G., Shalunov,
1731	              S., and J. Perser, "Packet Reordering Metrics", RFC 4737,
1732	              November 2006.

1734	   [RFC4898]  Mathis, M., Heffner, J., and R. Raghunarayan, "TCP
1735	              Extended Statistics MIB", RFC 4898, May 2007.

1737	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
1738	              Control", RFC 5681, September 2009.

1740	   [RFC5835]  Morton, A. and S. Van den Berghe, "Framework for Metric
1741	              Composition", RFC 5835, April 2010.

1743	   [RFC6049]  Morton, A. and E. Stephan, "Spatial Composition of
1744	              Metrics", RFC 6049, January 2011.

1746	   [RFC6673]  Morton, A., "Round-Trip Packet Loss Metrics", RFC 6673,
1747	              August 2012.

1749	   [RFC7312]  Fabini, J. and A. Morton, "Advanced Stream and Sampling
1750	              Framework for IP Performance Metrics (IPPM)", RFC 7312,
1751	              August 2014.

1753	   [RFC7398]  Bagnulo, M., Burbridge, T., Crawford, S., Eardley, P., and
1754	              A. Morton, "A Reference Path and Measurement Points for
1755	              Large-Scale Measurement of Broadband Performance",
1756	              RFC 7398, February 2015.

1758	   [I-D.ietf-aqm-recommendation]
1759	              Baker, F. and G. Fairhurst, "IETF Recommendations
1760	              Regarding Active Queue Management",
1761	              draft-ietf-aqm-recommendation-11 (work in progress),
1762	              February 2015.

1764	   [MSMO97]   Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The
1765	              Macroscopic Behavior of the TCP Congestion Avoidance
1766	              Algorithm", Computer Communications Review volume 27,
1767	              number3, July 1997.

1769	   [WPING]    Mathis, M., "Windowed Ping: An IP Level Performance
1770	              Diagnostic", INET 94, June 1994.

1772	   [mpingSource]
1773	              Fan, X., Mathis, M., and D. Hamon, "Git Repository for
1774	              mping: An IP Level Performance Diagnostic", Sept 2013,
1775	              <https://github.com/m-lab/mping>.

1777	   [MBMSource]
1778	              Hamon, D., Stuart, S., and H. Chen, "Git Repository for
1779	              Model Based Metrics", Sept 2013,
1780	              <https://github.com/m-lab/MBM>.

1782	   [Pathdiag]
1783	              Mathis, M., Heffner, J., O'Neil, P., and P. Siemsen,
1784	              "Pathdiag: Automated TCP Diagnosis", Passive and Active
1785	              Measurement , June 2008.

1787	   [iperf]    Wikipedia Contributors, "iPerf", Wikipedia, The Free
1788	              Encyclopedia , cited March 2015, <http://en.wikipedia.org/
1789	              w/index.php?title=Iperf&oldid=649720021>.

1791	   [StatQC]   Montgomery, D., "Introduction to Statistical Quality
1792	              Control - 2nd ed.", ISBN 0-471-51988-X, 1990.

1794	   [Rtool]    R Development Core Team, "R: A language and environment
1795	              for statistical computing. R Foundation for Statistical
1796	              Computing, Vienna, Austria. ISBN 3-900051-07-0, URL
1797	              http://www.R-project.org/",  , 2011.

1799	   [CVST]     Krueger, T. and M. Braun, "R package: Fast Cross-
1800	              Validation via Sequential Testing", version 0.1, 11 2012.

1802	   [AFD]      Pan, R., Breslau, L., Prabhakar, B., and S. Shenker,
1803	              "Approximate fairness through differential dropping",
1804	              SIGCOMM Comput. Commun. Rev.  33, 2, April 2003.

1806	   [wikiBloat]
1807	              Wikipedia, "Bufferbloat", http://en.wikipedia.org/w/
1808	              index.php?title=Bufferbloat&oldid=608805474, March 2015.

1810	   [CCscaling]
1811	              Fernando, F., Doyle, J., and S. Steven, "Scalable laws for
1812	              stable network congestion control", Proceedings of
1813	              Conference on Decision and
1814	              Control, http://www.ee.ucla.edu/~paganini, December 2001.

1816	Appendix A.  Model Derivations

1818	   The reference target_run_length described in Section 5.2 is based on
1819	   very conservative assumptions: that all window above target_pipe_size
1820	   contributes to a standing queue that raises the RTT, and that classic
1821	   Reno congestion control with delayed ACKs are in effect.  In this
1822	   section we provide two alternative calculations using different
1823	   assumptions.

1825	   It may seem out of place to allow such latitude in a measurement
1826	   standard, but this section provides offsetting requirements.

1828	   The estimates provided by these models make the most sense if network
1829	   performance is viewed logarithmically.  In the operational Internet,
1830	   data rates span more than 8 orders of magnitude, RTT spans more than
1831	   3 orders of magnitude, and loss probability spans at least 8 orders
1832	   of magnitude.  When viewed logarithmically (as in decibels), these
1833	   correspond to 80 dB of dynamic range.  On an 80 db scale, a 3 dB
1834	   error is less than 4% of the scale, even though it might represent a
1835	   factor of 2 in untransformed parameter.

1837	   This document gives a lot of latitude for calculating
1838	   target_run_length, however people designing a TDS should consider the
1839	   effect of their choices on the ongoing tussle about the relevance of
1840	   "TCP friendliness" as an appropriate model for Internet capacity
1841	   allocation.  Choosing a target_run_length that is substantially
1842	   smaller than the reference target_run_length specified in Section 5.2
1843	   strengthens the argument that it may be appropriate to abandon "TCP
1844	   friendliness" as the Internet fairness model.  This gives developers
1845	   incentive and permission to develop even more aggressive applications
1846	   and protocols, for example by increasing the number of connections
1847	   that they open concurrently.

1849	A.1.  Queueless Reno

1851	   In Section 5.2 it was assumed that the link rate matches the target
1852	   rate plus overhead, such that the excess window needed for the AIMD
1853	   sawtooth causes a fluctuating queue at the bottleneck.

1855	   An alternate situation would be bottleneck where there is no
1856	   significant queue and losses are caused by some mechanism that does
1857	   not involve extra delay, for example by the use of a virtual queue as
1858	   in Approximate Fair Dropping[AFD].  A flow controlled by such a
1859	   bottleneck would have a constant RTT and a data rate that fluctuates
1860	   in a sawtooth due to AIMD congestion control.  Assume the losses are
1861	   being controlled to make the average data rate meet some goal which
1862	   is equal or greater than the target_rate.  The necessary run length
1863	   can be computed as follows:

1865	   For some value of Wmin, the window will sweep from Wmin packets to
1866	   2*Wmin packets in 2*Wmin RTT (due to delayed ACK).  Unlike the
1867	   queueing case where Wmin = Target_pipe_size, we want the average of
1868	   Wmin and 2*Wmin to be the target_pipe_size, so the average rate is
1869	   the target rate.  Thus we want Wmin = (2/3)*target_pipe_size.

1871	   Between losses each sawtooth delivers (1/2)(Wmin+2*Wmin)(2Wmin)
1872	   packets in 2*Wmin round trip times.

1874	   Substituting these together we get:

1876	   target_run_length = (4/3)(target_pipe_size^2)

1878	   Note that this is 44% of the reference_run_length computed earlier.
1879	   This makes sense because under the assumptions in Section 5.2 the
1880	   AMID sawtooth caused a queue at the bottleneck, which raised the
1881	   effective RTT by 50%.

1883	Appendix B.  Complex Queueing

1885	   For many network technologies simple queueing models don't apply: the
1886	   network schedules, thins or otherwise alters the timing of ACKs and
1887	   data, generally to raise the efficiency of the channel allocation
1888	   when confronted with relatively widely spaced small ACKs.  These
1889	   efficiency strategies are ubiquitous for half duplex, wireless and
1890	   broadcast media.

1892	   Altering the ACK stream generally has two consequences: it raises the
1893	   effective bottleneck data rate, making slowstart burst at higher
1894	   rates (possibly as high as the sender's interface rate) and it
1895	   effectively raises the RTT by the average time that the ACKs and data
1896	   were delayed.  The first effect can be partially mitigated by
1897	   reclocking ACKs once they are beyond the bottleneck on the return
1898	   path to the sender, however this further raises the effective RTT.

1900	   The most extreme example of this sort of behavior would be a half
1901	   duplex channel that is not released as long as end point currently
1902	   holding the channel has more traffic (data or ACKs) to send.  Such
1903	   environments cause self clocked protocols under full load to revert
1904	   to extremely inefficient stop and wait behavior, where they send an
1905	   entire window of data as a single burst of the forward path, followed
1906	   by the entire window of ACKs on the return path.  It is important to
1907	   note that due to self clocking, ill conceived channel allocation
1908	   mechanisms can increase the stress on upstream links in a long path:
1909	   they cause large and faster bursts.

1911	   If a particular end-to-end path contains a link or device that alters
1912	   the ACK stream, then the entire path from the sender up to the
1913	   bottleneck must be tested at the burst parameters implied by the ACK
1914	   scheduling algorithm.  The most important parameter is the Effective
1915	   Bottleneck Data Rate, which is the average rate at which the ACKs
1916	   advance snd.una.  Note that thinning the ACKs (relying on the
1917	   cumulative nature of seg.ack to permit discarding some ACKs) is
1918	   implies an effectively infinite bottleneck data rate.

1920	   Holding data or ACKs for channel allocation or other reasons (such as
1921	   forward error correction) always raises the effective RTT relative to
1922	   the minimum delay for the path.  Therefore it may be necessary to
1923	   replace target_RTT in the calculation in Section 5.2 by an
1924	   effective_RTT, which includes the target_RTT plus a term to account
1925	   for the extra delays introduced by these mechanisms.

1927	Appendix C.  Version Control

1929	   This section to be removed prior to publication.

1931	   Formatted: Mon Mar 9 14:37:24 PDT 2015

1933	Authors' Addresses

1935	   Matt Mathis
1936	   Google, Inc
1937	   1600 Amphitheater Parkway
1938	   Mountain View, California  94043
1939	   USA

1941	   Email: mattmathis@google.com

1943	   Al Morton
1944	   AT&T Labs
1945	   200 Laurel Avenue South
1946	   Middletown, NJ  07748
1947	   USA

1949	   Phone: +1 732 420 1571
1950	   Email: acmorton@att.com
1951	   URI:   http://home.comcast.net/~acmacm/