idnits 2.17.1 draft-ietf-ippm-model-based-metrics-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 7 instances of lines with non-RFC2606-compliant FQDNs in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 578: '... A TDS or FSTDS MUST apportion all re...' RFC 2119 keyword, line 685: '...ecified TDS or FSTDS MUST document the...' RFC 2119 keyword, line 702: '...The TDS or FSTDS MUST document and jus...' RFC 2119 keyword, line 833: '...argets then this MUST be stated both t...' RFC 2119 keyword, line 834: '...etwork performance. The tests MUST be...' (2 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 354 has weird spacing: '...y tests deter...' == Line 365 has weird spacing: '...g tests are d...' == Line 370 has weird spacing: '...g tests evalu...' == Line 1006 has weird spacing: '... and n = h1...' -- The document date (July 3, 2014) is 3575 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Missing Reference: 'Dominant' is mentioned on line 242, but not defined

  == Missing Reference: 'W' is mentioned on line 1859, but not defined

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)

  -- Obsolete informational reference (is this intentional?): RFC 2861
     (Obsoleted by RFC 7661)

  == Outdated reference: A later version (-07) exists of
     draft-ietf-ippm-lmap-path-04


     Summary: 3 errors (**), 0 flaws (~~), 9 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	IP Performance Working Group                                   M. Mathis
3	Internet-Draft                                               Google, Inc
4	Intended status: Experimental                                  A. Morton
5	Expires: January 4, 2015                                       AT&T Labs
6	                                                            July 3, 2014

8	                  Model Based Bulk Performance Metrics
9	               draft-ietf-ippm-model-based-metrics-03.txt

11	Abstract

13	   We introduce a new class of model based metrics designed to determine
14	   if an end-to-end Internet path can meet predefined transport
15	   performance targets by applying a suite of IP diagnostic tests to
16	   successive subpaths.  The subpath-at-a-time tests can be robustly
17	   applied to key infrastructure, such as interconnects, to accurately
18	   detect if it will prevent the full end-to-end paths that traverse it
19	   from meeting the specified target performance.

21	   Each IP diagnostic test consists of a precomputed traffic pattern and
22	   a statistical criteria for evaluating packet delivery.  The traffic
23	   patterns are precomputed to mimic TCP or other transport protocol
24	   over a long path but are independent of the actual details of the
25	   subpath under test.  Likewise the success criteria depends on the
26	   target performance for the long path and not the details of the
27	   subpath.  This makes the measurements open loop, which introduces
28	   several important new properties and eliminates most of the
29	   difficulties encountered by traditional bulk transport metrics.

31	   This document does not define diagnostic tests, but provides a
32	   framework for designing suites of diagnostics tests that are tailored
33	   the confirming the target performance.

35	   Interim DRAFT Formatted: Thu Jul 3 20:19:04 PDT 2014

37	Status of this Memo

39	   This Internet-Draft is submitted in full conformance with the
40	   provisions of BCP 78 and BCP 79.

42	   Internet-Drafts are working documents of the Internet Engineering
43	   Task Force (IETF).  Note that other groups may also distribute
44	   working documents as Internet-Drafts.  The list of current Internet-
45	   Drafts is at http://datatracker.ietf.org/drafts/current/.

47	   Internet-Drafts are draft documents valid for a maximum of six months
48	   and may be updated, replaced, or obsoleted by other documents at any
49	   time.  It is inappropriate to use Internet-Drafts as reference
50	   material or to cite them other than as "work in progress."

52	   This Internet-Draft will expire on January 4, 2015.

54	Copyright Notice

56	   Copyright (c) 2014 IETF Trust and the persons identified as the
57	   document authors.  All rights reserved.

59	   This document is subject to BCP 78 and the IETF Trust's Legal
60	   Provisions Relating to IETF Documents
61	   (http://trustee.ietf.org/license-info) in effect on the date of
62	   publication of this document.  Please review these documents
63	   carefully, as they describe your rights and restrictions with respect
64	   to this document.  Code Components extracted from this document must
65	   include Simplified BSD License text as described in Section 4.e of
66	   the Trust Legal Provisions and are provided without warranty as
67	   described in the Simplified BSD License.

69	Table of Contents

71	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
72	     1.1.  TODO . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
73	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  7
74	   3.  New requirements relative to RFC 2330  . . . . . . . . . . . . 11
75	   4.  Background . . . . . . . . . . . . . . . . . . . . . . . . . . 11
76	     4.1.  TCP properties . . . . . . . . . . . . . . . . . . . . . . 12
77	     4.2.  Diagnostic Approach  . . . . . . . . . . . . . . . . . . . 14
78	   5.  Common Models and Parameters . . . . . . . . . . . . . . . . . 15
79	     5.1.  Target End-to-end parameters . . . . . . . . . . . . . . . 15
80	     5.2.  Common Model Calculations  . . . . . . . . . . . . . . . . 16
81	     5.3.  Parameter Derating . . . . . . . . . . . . . . . . . . . . 17
82	   6.  Common testing procedures  . . . . . . . . . . . . . . . . . . 17
83	     6.1.  Traffic generating techniques  . . . . . . . . . . . . . . 17
84	       6.1.1.  Paced transmission . . . . . . . . . . . . . . . . . . 17
85	       6.1.2.  Constant window pseudo CBR . . . . . . . . . . . . . . 18
86	       6.1.3.  Scanned window pseudo CBR  . . . . . . . . . . . . . . 19
87	       6.1.4.  Concurrent or channelized testing  . . . . . . . . . . 19
88	     6.2.  Interpreting the Results . . . . . . . . . . . . . . . . . 20
89	       6.2.1.  Test outcomes  . . . . . . . . . . . . . . . . . . . . 20
90	       6.2.2.  Statistical criteria for measuring run_length  . . . . 22
91	         6.2.2.1.  Alternate criteria for measuring run_length  . . . 23
92	       6.2.3.  Reordering Tolerance . . . . . . . . . . . . . . . . . 25
93	     6.3.  Test Preconditions . . . . . . . . . . . . . . . . . . . . 25
94	   7.  Diagnostic Tests . . . . . . . . . . . . . . . . . . . . . . . 26
95	     7.1.  Basic Data Rate and Delivery Statistics Tests  . . . . . . 26
96	       7.1.1.  Delivery Statistics at Paced Full Data Rate  . . . . . 27
97	       7.1.2.  Delivery Statistics at Full Data Windowed Rate . . . . 27
98	       7.1.3.  Background Delivery Statistics Tests . . . . . . . . . 27
99	     7.2.  Standing Queue Tests . . . . . . . . . . . . . . . . . . . 28
100	       7.2.1.  Congestion Avoidance . . . . . . . . . . . . . . . . . 29
101	       7.2.2.  Bufferbloat  . . . . . . . . . . . . . . . . . . . . . 29
102	       7.2.3.  Non excessive loss . . . . . . . . . . . . . . . . . . 30
103	       7.2.4.  Duplex Self Interference . . . . . . . . . . . . . . . 30
104	     7.3.  Slowstart tests  . . . . . . . . . . . . . . . . . . . . . 30
105	       7.3.1.  Full Window slowstart test . . . . . . . . . . . . . . 31
106	       7.3.2.  Slowstart AQM test . . . . . . . . . . . . . . . . . . 31
107	     7.4.  Sender Rate Burst tests  . . . . . . . . . . . . . . . . . 31
108	     7.5.  Combined Tests . . . . . . . . . . . . . . . . . . . . . . 32
109	       7.5.1.  Sustained burst test . . . . . . . . . . . . . . . . . 32
110	       7.5.2.  Streaming Media  . . . . . . . . . . . . . . . . . . . 33
111	   8.  An Example . . . . . . . . . . . . . . . . . . . . . . . . . . 34
112	   9.  Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 35
113	   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37
114	   11. Informative References . . . . . . . . . . . . . . . . . . . . 37
115	   Appendix A.  Model Derivations . . . . . . . . . . . . . . . . . . 40
116	     A.1.  Queueless Reno . . . . . . . . . . . . . . . . . . . . . . 40
117	     A.2.  CUBIC  . . . . . . . . . . . . . . . . . . . . . . . . . . 41
118	   Appendix B.  Complex Queueing  . . . . . . . . . . . . . . . . . . 42
119	   Appendix C.  Version Control . . . . . . . . . . . . . . . . . . . 43
120	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43

122	1.  Introduction

124	   Bulk performance metrics evaluate an Internet path's ability to carry
125	   bulk data.  Model based bulk performance metrics rely on mathematical
126	   TCP models to design a targeted diagnostic suite (TDS) of IP
127	   performance tests which can be applied independently to each subpath
128	   of the full end-to-end path.  These targeted diagnostic suites allow
129	   independent tests of subpaths to accurately detect if any subpath
130	   will prevent the full end-to-end path from delivering bulk data at
131	   the specified performance target, independent of the measurement
132	   vantage points or other details of the test procedures used for each
133	   measurement.

135	   The end-to-end target performance is determined by the needs of the
136	   user or application, outside the scope of this document.  For bulk
137	   data transport, the primary performance parameter of interest is the
138	   target data rate.  However, since TCP's ability to compensate for
139	   less than ideal network conditions is fundamentally affected by the
140	   Round Trip Time (RTT) and the Maximum Transmission Unit (MTU) of the
141	   entire end-to-end path over which the data traverses, these
142	   parameters must also be specified in advance.  They may reflect a
143	   specific real path through the Internet or an idealized path
144	   representing a typical user community.  The target values for these
145	   three parameters, Data Rate, RTT and MTU, inform the mathematical
146	   models used to design the TDS.

148	   Each IP diagnostic test in a TDS consists of a precomputed traffic
149	   pattern and statistical criteria for evaluating packet delivery.

151	   Mathematical models are used to design traffic patterns that mimic
152	   TCP or other bulk transport protocol operating at the target data
153	   rate, MTU and RTT over a full range of conditions, including flows
154	   that are bursty at multiple time scales.  The traffic patterns are
155	   computed in advance based on the three target parameters of the end-
156	   to-end path and independent of the properties of individual subpaths.
157	   As much as possible the measurement traffic is generated
158	   deterministically in ways that minimize the extent to which test
159	   methodology, measurement points, measurement vantage or path
160	   partitioning affect the details of the measurement traffic.

162	   Mathematical models are also used to compute the bounds on the packet
163	   delivery statistics for acceptable IP performance.  Since these
164	   statistics, such as packet loss, are typically aggregated from all
165	   subpaths of the end-to-end path, the end-to-end statistical bounds
166	   need to be apportioned as a separate bound for each subpath.  Note
167	   that links that are expected to be bottlenecks are expected to
168	   contribute more packet loss and/or delay.  In compensation, other
169	   links have to be constrained to contribute less packet loss and
170	   delay.  The criteria for passing each test of a TDS is an apportioned
171	   share of the total bound determined by the mathematical model from
172	   the end-to-end target performance.

174	   In addition to passing or failing, a test can be deemed to be
175	   inconclusive for a number of reasons including: the precomputed
176	   traffic pattern was not accurately generated; the measurement results
177	   were not statistically significant; and others such as failing to
178	   meet some required test preconditions.

180	   This document describes a framework for deriving traffic patterns and
181	   delivery statistics for model based metrics.  It does not fully
182	   specify any measurement techniques.  Important details such as packet
183	   type-p selection, sampling techniques, vantage selection, etc. are
184	   not specified here.  We imagine Fully Specified Targeted Diagnostic
185	   Suites (FSTDS), that define all of these details.  We use TDS to
186	   refer to the subset of such a specification that is in scope for this
187	   document.  A TDS includes the target parameters, documentation of the
188	   models and assumptions used to derive the diagnostic test parameters,
189	   specifications for the traffic and delivery statistics for the tests
190	   themselves, and a description of a test setup that can be used to
191	   validate the tests and models.

193	   Section 2 defines terminology used throughout this document.

195	   It has been difficult to develop Bulk Transport Capacity [RFC3148]
196	   metrics due to some overlooked requirements described in Section 3
197	   and some intrinsic problems with using protocols for measurement,
198	   described in Section 4.

200	   In Section 5 we describe the models and common parameters used to
201	   derive the targeted diagnostic suite.  In Section 6 we describe
202	   common testing procedures.  Each subpath is evaluated using suite of
203	   far simpler and more predictable diagnostic tests described in
204	   Section 7.  In Section 8 we present an example TDS that might be
205	   representative of HD video, and illustrate how MBM can be used to
206	   address difficult measurement situations, such as confirming that
207	   intercarrier exchanges have sufficient performance and capacity to
208	   deliver HD video between ISPs.

210	   There exists a small risk that model based metric itself might yield
211	   a false pass result, in the sense that every subpath of an end-to-end
212	   path passes every IP diagnostic test and yet a real application fails
213	   to attain the performance target over the end-to-end path.  If this
214	   happens, then the validation procedure described in Section 9 needs
215	   to be used to prove and potentially revise the models.

217	   Future documents will define model based metrics for other traffic
218	   classes and application types, such as real time streaming media.

220	1.1.  TODO

222	   Please send comments about this draft to ippm@ietf.org.  See
223	   http://goo.gl/02tkD for more information including: interim drafts,
224	   an up to date todo list and information on contributing.

226	   Formatted: Thu Jul 3 20:19:04 PDT 2014

228	2.  Terminology

230	   Terminology about paths, etc.  See [RFC2330] and
231	   [I-D.ietf-ippm-lmap-path].

233	   [data] sender  Host sending data and receiving ACKs.
234	   [data] receiver  Host receiving data and sending ACKs.
235	   subpath  A portion of the full path.  Note that there is no
236	      requirement that subpaths be non-overlapping.
237	   Measurement Point  Measurement points as described in
238	      [I-D.ietf-ippm-lmap-path].
239	   test path  A path between two measurement points that includes a
240	      subpath of the end-to-end path under test, and could include
241	      infrastructure between the measurement points and the subpath.
242	   [Dominant] Bottleneck  The Bottleneck that generally dominates
243	      traffic statistics for the entire path.  It typically determines a
244	      flow's self clock timing, packet loss and ECN marking rate.  See
245	      Section 4.1.
246	   front path  The subpath from the data sender to the dominant
247	      bottleneck.
248	   back path  The subpath from the dominant bottleneck to the receiver.
249	   return path  The path taken by the ACKs from the data receiver to the
250	      data sender.
251	   cross traffic  Other, potentially interfering, traffic competing for
252	      network resources (bandwidth and/or queue capacity).

254	   Properties determined by the end-to-end path and application.  They
255	   are described in more detail in Section 5.1.

257	   Application Data Rate  General term for the data rate as seen by the
258	      application above the transport layer.  This is the payload data
259	      rate, and excludes transport and lower level headers(TCP/IP or
260	      other protocols) and as well as retransmissions and other data
261	      that does not contribute to the total quantity of data delivered
262	      to the application.

264	   Link Data Rate  General term for the data rate as seen by the link or
265	      lower layers.  The link data rate includes transport and IP
266	      headers, retransmits and other transport layer overhead.  This
267	      document is agnostic as to whether the link data rate includes or
268	      excludes framing, MAC, or other lower layer overheads, except that
269	      they must be treated uniformly.
270	   end-to-end target parameters:  Application or transport performance
271	      goals for the end-to-end path.  They include the target data rate,
272	      RTT and MTU described below.
273	   Target Data Rate:  The application data rate, typically the ultimate
274	      user's performance goal.
275	   Target RTT (Round Trip Time):  The baseline (minimum) RTT of the
276	      longest end-to-end path over which the application expects to be
277	      able meet the target performance.  TCP and other transport
278	      protocol's ability to compensate for path problems is generally
279	      proportional to the number of round trips per second.  The Target
280	      RTT determines both key parameters of the traffic patterns (e.g.
281	      burst sizes) and the thresholds on acceptable traffic statistics.
282	      The Target RTT must be specified considering authentic packets
283	      sizes: MTU sized packets on the forward path, ACK sized packets
284	      (typically header_overhead) on the return path.
285	   Target MTU (Maximum Transmission Unit):  The maximum MTU supported by
286	      the end-to-end path the over which the application expects to meet
287	      the target performance.  Assume 1500 Byte packet unless otherwise
288	      specified.  If some subpath forces a smaller MTU, then it becomes
289	      the target MTU, and all model calculations and subpath tests must
290	      use the same smaller MTU.
291	   Effective Bottleneck Data Rate:  This is the bottleneck data rate
292	      inferred from the ACK stream, by looking at how much data the ACK
293	      stream reports delivered per unit time.  If the path is thinning
294	      ACKs or batching packets the effective bottleneck rate can be much
295	      higher than the average link rate.  See Section 4.1 and Appendix B
296	      for more details.
297	   [sender | interface] rate:  The burst data rate, constrained by the
298	      data sender's interfaces.  Today 1 or 10 Gb/s are typical.
299	   Header_overhead:  The IP and TCP header sizes, which are the portion
300	      of each MTU not available for carrying application payload.
301	      Without loss of generality this is assumed to be the size for
302	      returning acknowledgements (ACKs).  For TCP, the Maximum Segment
303	      Size (MSS) is the Target MTU minus the header_overhead.

305	   Basic parameters common to models and subpath tests.  They are
306	   described in more detail in Section 5.2.  Note that these are mixed
307	   between application transport performance (excludes headers) and link
308	   IP performance (includes headers).

310	   pipe size  A general term for number of packets needed in flight (the
311	      window size) to exactly fill some network path or subpath.  This
312	      is the window size which is normally the onset of queueing.
313	   target_pipe_size:  The number of packets in flight (the window size)
314	      needed to exactly meet the target rate, with a single stream and
315	      no cross traffic for the specified application target data rate,
316	      RTT, and MTU.  It is the amount of circulating data required to
317	      meet the target data rate, and implies the scale of the bursts
318	      that the network might experience.
319	   Delivery Statistics  Raw or summary statistics about packet delivery,
320	      packet losses, ECN marks, reordering, or any other properties of
321	      packet delivery that may be germane to transport performance.
322	   run length  A general term for the observed, measured, or specified
323	      number of packets that are (to be) delivered between losses or ECN
324	      marks.  Nominally one over the loss or ECN marking probability, if
325	      there are independently and identically distributed.
326	   target_run_length  The target_run_length is an estimate of the
327	      minimum required headway between losses or ECN marks necessary to
328	      attain the target_data_rate over a path with the specified
329	      target_RTT and target_MTU, as computed by a mathematical model of
330	      TCP congestion control.  A reference calculation is show in
331	      Section 5.2 and alternatives in Appendix A

333	   Ancillary parameters used for some tests

335	   derating:  Under some conditions the standard models are too
336	      conservative.  The modeling framework permits some latitude in
337	      relaxing or "derating" some test parameters as described in
338	      Section 5.3 in exchange for a more stringent TDS validation
339	      procedures, described in Section 9.
340	   subpath_data_rate  The maximum IP data rate supported by a subpath.
341	      This typically includes TCP/IP overhead, including headers,
342	      retransmits, etc.
343	   test_path_RTT  The RTT between two measurement points using
344	      appropriate data and ACK packet sizes.
345	   test_path_pipe  The amount of data necessary to fill a test path.
346	      Nominally the test path RTT times the subpath_data_rate (which
347	      should be part of the end-to-end subpath).
348	   test_window  The window necessary to meet the target_rate over a
349	      subpath.  Typically test_window=target_data_rate*test_RTT/
350	      (target_MTU - header_overhead).

352	   Tests can be classified into groups according to their applicability.

354	   Capacity tests  determine if a network subpath has sufficient
355	      capacity to deliver the target performance.  As long as the test
356	      traffic is within the proper envelope for the target end-to-end
357	      performance, the average packet losses or ECN must be below the
358	      threshold computed by the model.  As such, capacity tests reflect
359	      parameters that can transition from passing to failing as a
360	      consequence of cross traffic, additional presented load or the
361	      actions of other network users.  By definition, capacity tests
362	      also consume significant network resources (data capacity and/or
363	      buffer space), and the test schedules must be balanced by their
364	      cost.
365	   Monitoring tests  are designed to capture the most important aspects
366	      of a capacity test, but without presenting excessive ongoing load
367	      themselves.  As such they may miss some details of the network's
368	      performance, but can serve as a useful reduced-cost proxy for a
369	      capacity test.
370	   Engineering tests  evaluate how network algorithms (such as AQM and
371	      channel allocation) interact with TCP-style self clocked protocols
372	      and adaptive congestion control based on packet loss and ECN
373	      marks.  These tests are likely to have complicated interactions
374	      with other traffic and under some conditions can be inversely
375	      sensitive to load.  For example a test to verify that an AQM
376	      algorithm causes ECN marks or packet drops early enough to limit
377	      queue occupancy may experience a false pass result in the presence
378	      of bursty cross traffic.  It is important that engineering tests
379	      be performed under a wide range of conditions, including both in
380	      situ and bench testing, and over a wide variety of load
381	      conditions.  Ongoing monitoring is less likely to be useful for
382	      engineering tests, although sparse in situ testing might be
383	      appropriate.

385	   General Terminology:

387	   Targeted Diagnostic Test (TDS)  A set of IP Diagnostics designed to
388	      determine if a subpath can sustain flows at a specific
389	      target_data_rate over a path that has a target_RTT using
390	      target_MTU sided packets.
391	   Fully Specified Targeted Diagnostic Test  A TDS together with
392	      additional specification such as "type-p", etc which are out of
393	      scope for this document, but need to be drawn from other standards
394	      documents.
395	   apportioned  To divide and allocate, as in budgeting packet loss
396	      rates across multiple subpaths to accumulate below a specified
397	      end-to-end loss rate.

399	   open loop  A control theory term used to describe a class of
400	      techniques where systems that exhibit circular dependencies can be
401	      analyzed by suppressing some of the dependences, such that the
402	      resulting dependency graph is acyclic.

404	3.  New requirements relative to RFC 2330

406	   Model Based Metrics are designed to fulfill some additional
407	   requirement that were not recognized at the time RFC 2330 was written
408	   [RFC2330].  These missing requirements may have significantly
409	   contributed to policy difficulties in the IP measurement space.  Some
410	   additional requirements are:
411	   o  IP metrics must be actionable by the ISP - they have to be
412	      interpreted in terms of behaviors or properties at the IP or lower
413	      layers, that an ISP can test, repair and verify.
414	   o  Metrics must be vantage point invariant over a significant range
415	      of measurement point choices, including off path measurement
416	      points.  The only requirements on MP selection should be that the
417	      portion of the test path that is not under test is effectively
418	      ideal (or is non ideal in ways that can be calibrated out of the
419	      measurements) and the test RTT between the MPs is below some
420	      reasonable bound.
421	   o  Metrics must be repeatable by multiple parties with no specialized
422	      access to MPs or diagnostic infrastructure.  It must be possible
423	      for different parties to make the same measurement and observe the
424	      same results.  In particular it is specifically important that
425	      both a consumer (or their delegate) and ISP be able to perform the
426	      same measurement and get the same result.

428	   NB: All of the metric requirements in RFC 2330 should be reviewed and
429	   potentially revised.  If such a document is opened soon enough, this
430	   entire section should be dropped.

432	4.  Background

434	   At the time the IPPM WG was chartered, sound Bulk Transport Capacity
435	   measurement was known to be beyond our capabilities.  By hindsight it
436	   is now clear why it is such a hard problem:
437	   o  TCP is a control system with circular dependencies - everything
438	      affects performance, including components that are explicitly not
439	      part of the test.
440	   o  Congestion control is an equilibrium process, such that transport
441	      protocols change the network (raise loss probability and/or RTT)
442	      to conform to their behavior.

444	   o  TCP's ability to compensate for network flaws is directly
445	      proportional to the number of roundtrips per second (i.e.
446	      inversely proportional to the RTT).  As a consequence a flawed
447	      link may pass a short RTT local test even though it fails when the
448	      path is extended by a perfect network to some larger RTT.
449	   o  TCP has a meta Heisenberg problem - Measurement and cross traffic
450	      interact in unknown and ill defined ways.  The situation is
451	      actually worse than the traditional physics problem where you can
452	      at least estimate the relative momentum of the measurement and
453	      measured particles.  For network measurement you can not in
454	      general determine the relative "elasticity" of the measurement
455	      traffic and cross traffic, so you can not even gauge the relative
456	      magnitude of their effects on each other.

458	   These properties are a consequence of the equilibrium behavior
459	   intrinsic to how all throughput optimizing protocols interact with
460	   the network.  The protocols rely on control systems based on multiple
461	   network estimators to regulate the quantity of data sent into the
462	   network.  The data in turn alters network and the properties observed
463	   by the estimators, such that there are circular dependencies between
464	   every component and every property.  Since some of these estimators
465	   are non-linear, the entire system is nonlinear, and any change
466	   anywhere causes difficult to predict changes in every parameter.

468	   Model Based Metrics overcome these problems by forcing the
469	   measurement system to be open loop: the delivery statistics (akin to
470	   the network estimators) do not affect the traffic.  The traffic and
471	   traffic patterns (bursts) are computed on the basis of the target
472	   performance.  In order for a network to pass, the resulting delivery
473	   statistics and corresponding network estimators have to be such that
474	   they would not cause the control systems slow the traffic below the
475	   target rate.

477	4.1.  TCP properties

479	   TCP and SCTP are self clocked protocols.  The dominant steady state
480	   behavior is to have an approximately fixed quantity of data and
481	   acknowledgements (ACKs) circulating in the network.  The receiver
482	   reports arriving data by returning ACKs to the data sender, the data
483	   sender typically responds by sending exactly the same quantity of
484	   data back into the network.  The total quantity of data plus the data
485	   represented by ACKs circulating in the network is referred to as the
486	   window.  The mandatory congestion control algorithms incrementally
487	   adjust the window by sending slightly more or less data in response
488	   to each ACK.  The fundamentally important property of this systems is
489	   that it is entirely self clocked: The data transmissions are a
490	   reflection of the ACKs that were delivered by the network, the ACKs
491	   are a reflection of the data arriving from the network.

493	   A number of phenomena can cause bursts of data, even in idealized
494	   networks that are modeled as simple queueing systems.

496	   During slowstart the data rate is doubled on each RTT by sending
497	   twice as much data as was delivered to the receiver on the prior RTT.
498	   For slowstart to be able to fill such a network the network must be
499	   able to tolerate slowstart bursts up to the full pipe size inflated
500	   by the anticipated window reduction on the first loss or ECN mark.
501	   For example, with classic Reno congestion control, an optimal
502	   slowstart has to end with a burst that is twice the bottleneck rate
503	   for exactly one RTT in duration.  This burst causes a queue which is
504	   exactly equal to the pipe size (i.e. the window is exactly twice the
505	   pipe size) so when the window is halved in response to the first
506	   loss, the new window will be exactly the pipe size.

508	   Note that if the bottleneck data rate is significantly slower than
509	   the rest of the path, the slowstart bursts will not cause significant
510	   queues anywhere else along the path; they primarily exercise the
511	   queue at the dominant bottleneck.

513	   Other sources of bursts include application pauses and channel
514	   allocation mechanisms.  Appendix B describes the treatment of channel
515	   allocation systems.  If the application pauses (stops reading or
516	   writing data) for some fraction of one RTT, state-of-the-art TCP
517	   catches up to the earlier window size by sending a burst of data at
518	   the full sender interface rate.  To fill such a network with a
519	   realistic application, the network has to be able to tolerate
520	   interface rate bursts from the data sender large enough to cover
521	   application pauses.

523	   Although the interface rate bursts are typically smaller than last
524	   burst of a slowstart, they are at a higher data rate so they
525	   potentially exercise queues at arbitrary points along the front path
526	   from the data sender up to and including the queue at the dominant
527	   bottleneck.  There is no model for how frequent or what sizes of
528	   sender rate bursts should be tolerated.

530	   To verify that a path can meet a performance target, it is necessary
531	   to independently confirm that the path can tolerate bursts in the
532	   dimensions that can be caused by these mechanisms.  Three cases are
533	   likely to be sufficient:

535	   o  Slowstart bursts sufficient to get connections started properly.
536	   o  Frequent sender interface rate bursts that are small enough where
537	      they can be assumed not to significantly affect delivery
538	      statistics.  (Implicitly derated by selecting the burst size).

540	   o  Infrequent sender interface rate full target_pipe_size bursts that
541	      do affect the delivery statistics.  (Target_run_length is
542	      derated).

544	4.2.  Diagnostic Approach

546	   The MBM approach is to open loop TCP by precomputing traffic patterns
547	   that are typically generated by TCP operating at the given target
548	   parameters, and evaluating delivery statistics (packet loss, ECN
549	   marks and delay).  In this approach the measurement software
550	   explicitly controls the data rate, transmission pattern or cwnd
551	   (TCP's primary congestion control state variables) to create
552	   repeatable traffic patterns that mimic TCP behavior but are
553	   independent of the actual behavior of the subpath under test.  These
554	   patterns are manipulated to probe the network to verify that it can
555	   deliver all of the traffic patterns that a transport protocol is
556	   likely to generate under normal operation at the target rate and RTT.

558	   By opening the protocol control loops, we remove most sources of
559	   temporal and spatial correlation in the traffic delivery statistics,
560	   such that each subpath's contribution to the end-to-end statistics
561	   can be assumed to be independent and stationary (The delivery
562	   statistics depend on the fine structure of the data transmissions,
563	   but not on long time scale state imbedded in the sender, receiver or
564	   other network components.)  Therefore each subpath's contribution to
565	   the end-to-end delivery statistics can be assumed to be independent,
566	   and spatial composition techniques such as [RFC5835] and [RFC6049]
567	   apply.

569	   In typical networks, the dominant bottleneck contributes the majority
570	   of the packet loss and ECN marks.  Often the rest of the path makes
571	   insignificant contribution to these properties.  A TDS should
572	   apportion the end-to-end budget for the specified parameters
573	   (primarily packet loss and ECN marks) to each subpath or group of
574	   subpaths.  For example the dominant bottleneck may be permitted to
575	   contribute 90% of the loss budget, while the rest of the path is only
576	   permitted to contribute 10%.

578	   A TDS or FSTDS MUST apportion all relevant packet delivery statistics
579	   between different subpaths, such that the spatial composition of the
580	   apportioned metrics yields end-to-end statics which are within the
581	   bounds determined by the models.

583	   A network is expected to be able to sustain a Bulk TCP flow of a
584	   given data rate, MTU and RTT when the following conditions are met:
585	   o  The raw link rate is higher than the target data rate.

587	   o  The observed delivery statistics are better than required by a
588	      suitable TCP performance model (e.g. fewer losses).
589	   o  There is sufficient buffering at the dominant bottleneck to absorb
590	      a slowstart rate burst large enough to get the flow out of
591	      slowstart at a suitable window size.
592	   o  There is sufficient buffering in the front path to absorb and
593	      smooth sender interface rate bursts at all scales that are likely
594	      to be generated by the application, any channel arbitration in the
595	      ACK path or other mechanisms.
596	   o  When there is a standing queue at a bottleneck for a shared media
597	      subpath, there are suitable bounds on how the data and ACKs
598	      interact, for example due to the channel arbitration mechanism.
599	   o  When there is a slowly rising standing queue at the bottleneck the
600	      onset of packet loss has to be at an appropriate point (time or
601	      queue depth) and progressive.  This typically requires some form
602	      of Automatic Queue Management [RFC2309].

604	   We are developing a tool that can perform many of the tests described
605	   here[MBMSource].

607	5.  Common Models and Parameters

609	5.1.  Target End-to-end parameters

611	   The target end-to-end parameters are the target data rate, target RTT
612	   and target MTU as defined in Section 2.  These parameters are
613	   determined by the needs of the application or the ultimate end user
614	   and the end-to-end Internet path over which the application is
615	   expected to operate.  The target parameters are in units that make
616	   sense to upper layers: payload bytes delivered to the application,
617	   above TCP.  They exclude overheads associated with TCP and IP
618	   headers, retransmits and other protocols (e.g.  DNS).

620	   Other end-to-end parameters defined in Section 2 include the
621	   effective bottleneck data rate, the sender interface data rate and
622	   the TCP/IP header sizes (overhead).

624	   The target data rate must be smaller than all link data rates by
625	   enough headroom to carry the transport protocol overhead, explicitly
626	   including retransmissions and an allowance for fluctuations in the
627	   actual data rate, needed to meet the specified average rate.
628	   Specifying a target rate with insufficient headroom are likely to
629	   result in brittle measurements having little predictive value.

631	   Note that the target parameters can be specified for a hypothetical
632	   path, for example to construct TDS designed for bench testing in the
633	   absence of a real application, or for a real physical test, for in
634	   situ testing of production infrastructure.

636	   The number of concurrent connections is explicitly not a parameter to
637	   this model.  If a subpath requires multiple connections in order to
638	   meet the specified performance, that must be stated explicitly and
639	   the procedure described in Section 6.1.4 applies.

641	5.2.  Common Model Calculations

643	   The end-to-end target parameters are used to derive the
644	   target_pipe_size and the reference target_run_length.

646	   The target_pipe_size, is the average window size in packets needed to
647	   meet the target rate, for the specified target RTT and MTU.  It is
648	   given by:

650	   target_pipe_size = target_rate * target_RTT / ( target_MTU -
651	   header_overhead )

653	   Target_run_length is an estimate of the minimum required headway
654	   between losses or ECN marks, as computed by a mathematical model of
655	   TCP congestion control.  The derivation here follows [MSMO97], and by
656	   design is quite conservative.  The alternate models described in
657	   Appendix A generally yield smaller run_lengths (higher loss rates),
658	   but may not apply in all situations.  In any case alternate models
659	   should be compared to the reference target_run_length computed here.

661	   Reference target_run_length is derived as follows: assume the
662	   subpath_data_rate is infinitesimally larger than the target_data_rate
663	   plus the required header_overhead.  Then target_pipe_size also
664	   predicts the onset of queueing.  A larger window will cause a
665	   standing queue at the bottleneck.

667	   Assume the transport protocol is using standard Reno style Additive
668	   Increase, Multiplicative Decrease congestion control [RFC5681] (but
669	   not Appropriate Byte Counting [RFC3465]) and the receiver is using
670	   standard delayed ACKs.  Reno increases the window by one packet every
671	   pipe_size worth of ACKs.  With delayed ACKs this takes 2 Round Trip
672	   Times per increase.  To exactly fill the pipe losses must be no
673	   closer than when the peak of the AIMD sawtooth reached exactly twice
674	   the target_pipe_size otherwise the multiplicative window reduction
675	   triggered by the loss would cause the network to be underfilled.
676	   Following [MSMO97] the number of packets between losses must be the
677	   area under the AIMD sawtooth.  They must be no more frequent than
678	   every 1 in ((3/2)*target_pipe_size)*(2*target_pipe_size) packets,
679	   which simplifies to:

681	   target_run_length = 3*(target_pipe_size^2)
682	   Note that this calculation is very conservative and is based on a
683	   number of assumptions that may not apply.  Appendix A discusses these
684	   assumptions and provides some alternative models.  If a different
685	   model is used, a fully specified TDS or FSTDS MUST document the
686	   actual method for computing target_run_length along with the
687	   rationale for the underlying assumptions and the ratio of chosen
688	   target_run_length to the reference target_run_length calculated
689	   above.

691	   These two parameters, target_pipe_size and target_run_length,
692	   directly imply most of the individual parameters for the tests in
693	   Section 7.

695	5.3.  Parameter Derating

697	   Since some aspects of the models are very conservative, this
698	   framework permits some latitude in derating test parameters.  Rather
699	   than trying to formalize more complicated models we permit some test
700	   parameters to be relaxed as long as they meet some additional
701	   procedural constraints:
702	   o  The TDS or FSTDS MUST document and justify the actual method used
703	      compute the derated metric parameters.
704	   o  The validation procedures described in Section 9 must be used to
705	      demonstrate the feasibility of meeting the performance targets
706	      with infrastructure that infinitesimally passes the derated tests.
707	   o  The validation process itself must be documented is such a way
708	      that other researchers can duplicate the validation experiments.

710	   Except as noted, all tests below assume no derating.  Tests where
711	   there is not currently a well established model for the required
712	   parameters explicitly include derating as a way to indicate
713	   flexibility in the parameters.

715	6.  Common testing procedures

717	6.1.  Traffic generating techniques

719	6.1.1.  Paced transmission

721	   Paced (burst) transmissions: send bursts of data on a timer to meet a
722	   particular target rate and pattern.  In all cases the specified data
723	   rate can either be the application or link rates.  Header overheads
724	   must be included in the calculations as appropriate.

726	   Paced single packets:  Send individual packets at the specified rate
727	      or headway.
728	   Burst:  Send sender interface rate bursts on a timer.  Specify any 3
729	      of: average rate, packet size, burst size (number of packets) and
730	      burst headway (burst start to start).  These bursts are typically
731	      sent as back-to-back packets at the testers interface rate.
732	   Slowstart bursts:  Send 4 packet sender interface rate bursts at an
733	      average data rate equal to twice effective bottleneck link rate
734	      (but not more than the sender interface rate).  This corresponds
735	      to the average rate during a TCP slowstart when Appropriate Byte
736	      Counting [RFC3465] is present or delayed ack is disabled.  Note
737	      that if the effective bottleneck link rate is more than half of
738	      the sender interface rate, slowstart bursts become sender
739	      interface rate bursts.
740	   Repeated Slowstart bursts:  Slowstart bursts are typically part of
741	      larger scale pattern of repeated bursts, such as sending
742	      target_pipe_size packets as slowstart bursts on a target_RTT
743	      headway (burst start to burst start).  Such a stream has three
744	      different average rates, depending on the averaging interval.  At
745	      the finest time scale the average rate is the same as the sender
746	      interface rate, at a medium scale the average rate is twice the
747	      effective bottleneck link rate and at the longest time scales the
748	      average rate is equal to the target data rate.

750	   Note that in conventional measurement theory, exponential
751	   distributions are often used to eliminate many sorts of correlations.
752	   For the procedures above, the correlations are created by the network
753	   elements and accurately reflect their behavior.  At some point in the
754	   future, it may be desirable to introduce noise sources into the above
755	   pacing models, but the are not warranted at this time.

757	6.1.2.  Constant window pseudo CBR

759	   Implement pseudo constant bit rate by running a standard protocol
760	   such as TCP with a fixed window size.  The rate is only maintained in
761	   average over each RTT, and is subject to limitations of the transport
762	   protocol.

764	   The window size is computed from the target_data_rate and the actual
765	   RTT of the test path.

767	   If the transport protocol fails to maintain the test rate within
768	   prescribed limits the test would typically be considered inconclusive
769	   or failing, depending on what mechanism caused the reduced rate.  See
770	   the discussion of test outcomes in Section 6.2.1.

772	6.1.3.  Scanned window pseudo CBR

774	   Same as the above, except the window is scanned across a range of
775	   sizes designed to include two key events, the onset of queueing and
776	   the onset of packet loss or ECN marks.  The window is scanned by
777	   incrementing it by one packet for every 2*target_pipe_size delivered
778	   packets.  This mimics the additive increase phase of standard TCP
779	   congestion avoidance and normally separates the the window increases
780	   by approximately twice the target_RTT.

782	   There are two versions of this test: one built by applying a window
783	   clamp to standard congestion control and the other built by
784	   stiffening a non-standard transport protocol.  When standard
785	   congestion control is in effect, any losses or ECN marks cause the
786	   transport to revert to a window smaller than the clamp such that the
787	   scanning clamp loses control the window size.  The NPAD pathdiag tool
788	   is an example of this class of algorithms [Pathdiag].

790	   Alternatively a non-standard congestion control algorithm can respond
791	   to losses by transmitting extra data, such that it maintains the
792	   specified window size independent of losses or ECN marks.  Such a
793	   stiffened transport explicitly violates mandatory Internet congestion
794	   control and is not suitable for in situ testing.  It is only
795	   appropriate for engineering testing under laboratory conditions.  The
796	   Windowed Ping tools implemented such a test [WPING].  The tool
797	   described in the paper has been updated.[mpingSource]

799	   The test procedures in Section 7.2 describe how to the partition the
800	   scans into regions and how to interpret the results.

802	6.1.4.  Concurrent or channelized testing

804	   The procedures described in this document are only directly
805	   applicable to single stream performance measurement, e.g. one TCP
806	   connection.  In an ideal world, we would disallow all performance
807	   claims based multiple concurrent streams, but this is not practical
808	   due to at least two different issues.  First, many very high rate
809	   link technologies are channelized and pin individual flows to
810	   specific channels to minimize reordering or other problems and
811	   second, TCP itself has scaling limits.  Although the former problem
812	   might be overcome through different design decisions, the later
813	   problem is more deeply rooted.

815	   All standard [RFC5681] and de facto standard congestion control
816	   algorithms [CUBIC] have scaling limits, in the sense that as a long
817	   fast network (LFN) with a fixed RTT and MTU gets faster, these
818	   congestion control algorithms get less accurate and as a consequence
819	   have difficulty filling the network[CCscaling].  These properties are
820	   a consequence of the original Reno AIMD congestion control design and
821	   the requirement in [RFC5681] that all transport protocols have
822	   uniform response to congestion.

824	   There are a number of reasons to want to specify performance in term
825	   of multiple concurrent flows, however this approach is not
826	   recommended for data rates below several megabits per second, which
827	   can be attained with run lengths under 10000 packets.  Since the
828	   required run length goes as the square of the data rate, at higher
829	   rates the run lengths can be unreasonably large, and multiple
830	   connection might be the only feasible approach.

832	   If multiple connections are deemed necessary to meet aggregate
833	   performance targets then this MUST be stated both the design of the
834	   TDS and in any claims about network performance.  The tests MUST be
835	   performed concurrently with the specified number of connections.  For
836	   the the tests that use bursty traffic, the bursts should be
837	   synchronized across flows.

839	6.2.  Interpreting the Results

841	6.2.1.  Test outcomes

843	   To perform an exhaustive test of an end-to-end network path, each
844	   test of the TDS is applied to each subpath of an end-to-end path.  If
845	   any subpath fails any test then an application running over the end-
846	   to-end path can also be expected to fail to attain the target
847	   performance under some conditions.

849	   In addition to passing or failing, a test can be deemed to be
850	   inconclusive for a number of reasons.  Proper instrumentation and
851	   treatment of inconclusive outcomes is critical to the accuracy and
852	   robustness of Model Based Metrics.  Tests can be inconclusive if the
853	   precomputed traffic pattern or data rates were not accurately
854	   generated; the measurement results were not statistically
855	   significant; and others causes such as failing to meet some required
856	   preconditions for the test.

858	   For example consider a test that implements Constant Window Pseudo
859	   CBR (Section 6.1.2) by adding rate controls and detailed traffic
860	   instrumentation to TCP (e.g.  [RFC4898]).  TCP includes built in
861	   control systems which might interfere with the sending data rate.  If
862	   such a test meets the required delivery statistics (e.g. run length)
863	   while failing to attain the specified data rate it must be treated as
864	   an inconclusive result, because we can not a priori determine if the
865	   reduced data rate was caused by a TCP problem or a network problem,
866	   or if the reduced data rate had a material effect on the delivery
867	   statistics themselves.

869	   Note that for load tests such as this example, an if the observed
870	   delivery statistics fail to meet the targets, the test can can be
871	   considered to have failed the test because it doesn't really matter
872	   that the test didn't attain the required data rate.

874	   The really important new properties of MBM, such as vantage
875	   independence, are a direct consequence of opening the control loops
876	   in the protocols, such that the test traffic does not depend on
877	   network conditions or traffic received.  Any mechanism that
878	   introduces feedback between the traffic measurements and the traffic
879	   generation is at risk of introducing nonlinearities that spoil these
880	   properties.  Any exceptional event that indicates that such feedback
881	   has happened should cause the test to be considered inconclusive.

883	   One way to view inconclusive tests is that they reflect situations
884	   where a test outcome is ambiguous between limitations of the network
885	   and some unknown limitation of the diagnostic test itself, which may
886	   have been caused by some uncontrolled feedback from the network.

888	   Note that procedures that attempt to sweep the target parameter space
889	   to find the limits on some parameter (for example to find the highest
890	   data rate for a subpath) are likely to break the location independent
891	   properties of Model Based Metrics, because the boundary between
892	   passing and inconclusive is generally sensitive to RTT.  This
893	   interaction is because TCP's ability to compensate for flaws in the
894	   network scales with the number of round trips per second.  Repeating
895	   the same procedure from a different vantage point with a larger RTT
896	   is likely get a different result, because with the larger TCP will
897	   less accurately control the data rate.

899	   One of the goals for evolving TDS designs will be to keep sharpening
900	   distinction between inconclusive, passing and failing tests.  The
901	   criteria for for passing, failing and inconclusive tests MUST be
902	   explicitly stated for every test in the TDS or FSTDS.

904	   One of the goals of evolving the testing process, procedures tools
905	   and measurement point selection should be to minimize the number of
906	   inconclusive tests.

908	   It may be useful to keep raw data delivery statistics for deeper
909	   study of the behavior of the network path and to measure the tools.
910	   Raw delivery statistics can help to drive tool evolution.  Under some
911	   conditions it might be possible to reevaluate the raw data for
912	   satisfying alternate performance targets.  However it is important to
913	   guard against sampling bias and other implicit feedback which can
914	   cause false results and exhibit measurement point vantage
915	   sensitivity.

917	6.2.2.  Statistical criteria for measuring run_length

919	   When evaluating the observed run_length, we need to determine
920	   appropriate packet stream sizes and acceptable error levels for
921	   efficient measurement.  In practice, can we compare the empirically
922	   estimated packet loss and ECN marking probabilities with the targets
923	   as the sample size grows?  How large a sample is needed to say that
924	   the measurements of packet transfer indicate a particular run length
925	   is present?

927	   The generalized measurement can be described as recursive testing:
928	   send packets (individually or in patterns) and observe the packet
929	   delivery performance (loss ratio or other metric, any marking we
930	   define).

932	   As each packet is sent and measured, we have an ongoing estimate of
933	   the performance in terms of the ratio of packet loss or ECN mark to
934	   total packets (i.e. an empirical probability).  We continue to send
935	   until conditions support a conclusion or a maximum sending limit has
936	   been reached.

938	   We have a target_mark_probability, 1 mark per target_run_length,
939	   where a "mark" is defined as a lost packet, a packet with ECN mark,
940	   or other signal.  This constitutes the null Hypothesis:

942	   H0:  no more than one mark in target_run_length =
943	      3*(target_pipe_size)^2 packets

945	   and we can stop sending packets if on-going measurements support
946	   accepting H0 with the specified Type I error = alpha (= 0.05 for
947	   example).

949	   We also have an alternative Hypothesis to evaluate: if performance is
950	   significantly lower than the target_mark_probability.  Based on
951	   analysis of typical values and practical limits on measurement
952	   duration, we choose four times the H0 probability:

954	   H1:  one or more marks in (target_run_length/4) packets

956	   and we can stop sending packets if measurements support rejecting H0
957	   with the specified Type II error = beta (= 0.05 for example), thus
958	   preferring the alternate hypothesis H1.

960	   H0 and H1 constitute the Success and Failure outcomes described
961	   elsewhere in the memo, and while the ongoing measurements do not
962	   support either hypothesis the current status of measurements is
963	   inconclusive.

965	   The problem above is formulated to match the Sequential Probability
966	   Ratio Test (SPRT) [StatQC].  Note that as originally framed the
967	   events under consideration were all manufacturing defects.  In
968	   networking, ECN marks and lost packets are not defects but signals,
969	   indicating that the transport protocol should slow down.

971	   The Sequential Probability Ratio Test also starts with a pair of
972	   hypothesis specified as above:

974	   H0:  p0 = one defect in target_run_length
975	   H1:  p1 = one defect in target_run_length/4
976	   As packets are sent and measurements collected, the tester evaluates
977	   the cumulative defect count against two boundaries representing H0
978	   Acceptance or Rejection (and acceptance of H1):

980	   Acceptance line:  Xa = -h1 + sn
981	   Rejection line:  Xr = h2 + sn
982	   where n increases linearly for each packet sent and

984	   h1 =  { log((1-alpha)/beta) }/k
985	   h2 =  { log((1-beta)/alpha) }/k
986	   k  =  log{ (p1(1-p0)) / (p0(1-p1)) }
987	   s  =  [ log{ (1-p0)/(1-p1) } ]/k
988	   for p0 and p1 as defined in the null and alternative Hypotheses
989	   statements above, and alpha and beta as the Type I and Type II error.

991	   The SPRT specifies simple stopping rules:

993	   o  Xa < defect_count(n) < Xb: continue testing
994	   o  defect_count(n) <= Xa: Accept H0
995	   o  defect_count(n) >= Xb: Accept H1

997	   The calculations above are implemented in the R-tool for Statistical
998	   Analysis [Rtool] , in the add-on package for Cross-Validation via
999	   Sequential Testing (CVST) [CVST] .

1001	   Using the equations above, we can calculate the minimum number of
1002	   packets (n) needed to accept H0 when x defects are observed.  For
1003	   example, when x = 0:

1005	   Xa = 0  = -h1 + sn
1006	   and  n = h1 / s

1008	6.2.2.1.  Alternate criteria for measuring run_length

1010	   An alternate calculation, contributed by Alex Gilgur (Google).

1012	   The probability of failure within an interval whose length is
1013	   target_run_length is given by an exponential distribution with rate =
1014	   1 / target_run_length (a memoryless process).  The implication of
1015	   this is that it will be different, depending on the total count of
1016	   packets that have been through the pipe, the formula being:

1018	   P(t1 < T < t2) = R(t1) - R(t2),

1020	   where

1022	 T = number of packets at which a failure will occur with probability P;
1023	 t = number of packets:
1024	 t1 = number of packets (e.g., when failure last occurred)
1025	 t2 = t1 + target_run_length
1026	 R = failure rate:
1027	 R(t1) = exp (-t1/target_run_length)
1028	 R(t2) = exp (-t2/target_run_length)

1030	   The algorithm:

1032	   initialize the packet.counter = 0
1033	   initialize the failed.packet.counter = 0
1034	   start the loop
1035	   if paket_response = ACK:
1036	   increment the packet.counter
1037	   else:
1038	   ### The packet failed
1039	   increment the packet.counter
1040	   increment the failed.packet.counter

1042	   P_fail_observed = failed.packet.counter/packet.counter

1044	   upper_bound =  packet.counter + target.run.length / 2
1045	   lower_bound =  packet.counter - target.run.length / 2

1047	   R1 = exp( -upper_bound / target.run.length)
1048	   R0 = R(max(0, lower_bound)/ target.run.length)

1050	   P_fail_predicted = R1-R0
1051	   Compare P_fail_observed vs. P_fail_predicted
1052	   end-if
1053	   continue the loop

1055	   This algorithm allows accurate comparison of the observed failure
1056	   probability with the corresponding values predicted based on a fixed
1057	   target_failure_rate, which is equal to 1.0 / target_run_length.

1059	6.2.3.  Reordering Tolerance

1061	   All tests must be instrumented for packet level reordering [RFC4737].
1062	   However, there is no consensus for how much reordering should be
1063	   acceptable.  Over the last two decades the general trend has been to
1064	   make protocols and applications more tolerant to reordering (see for
1065	   example [RFC4015]), in response to the gradual increase in reordering
1066	   in the network.  This increase has been due to the gradual deployment
1067	   of technologies such as multi threaded routing lookups and Equal Cost
1068	   Multipath (ECMP) routing.  These techniques increase parallelism in
1069	   network and are critical to enabling overall Internet growth to
1070	   exceed Moore's Law.

1072	   Note that transport retransmission strategies can trade off
1073	   reordering tolerance vs how quickly can repair losses vs overhead
1074	   from spurious retransmissions.  In advance of new retransmission
1075	   strategies we propose the following strawman: Transport protocols
1076	   should be able to adapt to reordering as long as the reordering
1077	   extent is no more than the maximum of one half window or 1 mS,
1078	   whichever is larger.  Within this limit on reorder extent, there
1079	   should be no bound on reordering density.

1081	   By implication, recording which is less than these bounds should not
1082	   be treated as a network impairment.  However [RFC4737] still applies:
1083	   reordering should be instrumented and the maximum reordering that can
1084	   be properly characterized by the test (e.g. bound on history buffers)
1085	   should be recorded with the measurement results.

1087	   Reordering tolerance and diagnostic bounds must be specified in a
1088	   FSTDS.

1090	6.3.  Test Preconditions

1092	   Many tests have preconditions which are required to assure their
1093	   validity.  For example the presence or nonpresence of cross traffic
1094	   on specific subpaths, or appropriate preloading to put reactive
1095	   network elements into the proper states[I-D.ietf-ippm-2330-update])
1096	   If preconditions are not properly satisfied for some reason, the
1097	   tests should be considered to be inconclusive.  In general it is
1098	   useful to preserve diagnostic information about why the preconditions
1099	   were not met, and the test data that was collected, if any.

1101	   It is important to preserve the record that a test was scheduled,
1102	   because otherwise precondition enforcement mechanisms can introduce
1103	   sampling bias.  For example, canceling tests due to load on
1104	   subscriber access links may introduce sampling bias for tests of the
1105	   rest of the network by reducing the number of tests during peak
1106	   network load.

1108	   Test preconditions and failure actions must be specified in a FSTDS.

1110	7.  Diagnostic Tests

1112	   The diagnostic tests below are organized by traffic pattern: basic
1113	   data rate and delivery statistics, standing queues, slowstart bursts,
1114	   and sender rate bursts.  We also introduce some combined tests which
1115	   are more efficient when networks are expected to pass, but conflate
1116	   diagnostic signatures when they fail.

1118	   There are a number of test details which are not fully defined here.
1119	   They must be fully specified in a FSTDS.  From a standardization
1120	   perspective, this lack of specificity will weaken this version of
1121	   Model Based Metrics, however it is anticipated that this it be more
1122	   than offset by the extent to which MBM suppresses the problems caused
1123	   by using transport protocols for measurement. e.g. non-specific MBM
1124	   metrics are likely to have better repeatability than many existing
1125	   BTC like metrics.  Once we have good field experience, the missing
1126	   details can be fully specified.

1128	7.1.  Basic Data Rate and Delivery Statistics Tests

1130	   We propose several versions of the basic data rate and delivery
1131	   statistics test.  All measure the number of packets delivered between
1132	   losses or ECN marks, using a data stream that is rate controlled at
1133	   or below the target_data_rate.

1135	   The tests below differ in how the data rate is controlled.  The data
1136	   can be paced on a timer, or window controlled at full target data
1137	   rate.  The first two tests implicitly confirm that sub_path has
1138	   sufficient raw capacity to carry the target_data_rate.  They are
1139	   recommend for relatively infrequent testing, such as an installation
1140	   or periodic auditing process.  The third, background delivery
1141	   statistics, is a low rate test designed for ongoing monitoring for
1142	   changes in subpath quality.

1144	   All rely on the receiver accumulating packet delivery statistics as
1145	   described in Section 6.2.2 to score the outcome:

1147	   Pass: it is statistically significant that the observed interval
1148	   between losses or ECN marks is larger than the target_run_length.

1150	   Fail: it is statistically significant that the observed interval
1151	   between losses or ECN marks is smaller than the target_run_length.

1153	   A test is considered to be inconclusive if it failed to meet the data
1154	   rate as specified below, meet the qualifications defined in
1155	   Section 6.3 or neither run length statistical hypothesis was
1156	   confirmed in the allotted test duration.

1158	7.1.1.  Delivery Statistics at Paced Full Data Rate

1160	   Confirm that the observed run length is at least the
1161	   target_run_length while relying on timer to send data at the
1162	   target_rate using the procedure described in in Section 6.1.1 with a
1163	   burst size of 1 (single packets) or 2 (packet pairs).

1165	   The test is considered to be inconclusive if the packet transmission
1166	   can not be accurately controlled for any reason.

1168	   RFC 6673 [RFC6673] is appropriate for measuring delivery statistics
1169	   at full data rate.

1171	7.1.2.  Delivery Statistics at Full Data Windowed Rate

1173	   Confirm that the observed run length is at least the
1174	   target_run_length while sending at an average rate approximately
1175	   equal to the target_data_rate, by controlling (or clamping) the
1176	   window size of a conventional transport protocol to a fixed value
1177	   computed from the properties of the test path, typically
1178	   test_window=target_data_rate*test_RTT/target_MTU.  Note that if there
1179	   is any interaction between the forward and return path, test_window
1180	   may need to be adjusted slightly to compensate for the resulting
1181	   inflated RTT.

1183	   Since losses and ECN marks generally cause transport protocols to at
1184	   least temporarily reduce their data rates, this test is expected to
1185	   be less precise about controlling its data rate.  It should not be
1186	   considered inconclusive as long as at least some of the round trips
1187	   reached the full target_data_rate without incurring losses or ECN
1188	   marks.  To pass this test the network MUST deliver target_pipe_size
1189	   packets in target_RTT time without any losses or ECN marks at least
1190	   once per two target_pipe_size round trips, in addition to meeting the
1191	   run length statistical test.

1193	7.1.3.  Background Delivery Statistics Tests

1195	   The background run length is a low rate version of the target target
1196	   rate test above, designed for ongoing lightweight monitoring for
1197	   changes in the observed subpath run length without disrupting users.
1198	   It should be used in conjunction with one of the above full rate
1199	   tests because it does not confirm that the subpath can support raw
1200	   data rate.

1202	   RFC 6673 [RFC6673] is appropriate for measuring background delivery
1203	   statistics.

1205	7.2.  Standing Queue Tests

1207	   These test confirm that the bottleneck is well behaved across the
1208	   onset of packet loss, which typically follows after the onset of
1209	   queueing.  Well behaved generally means lossless for transient
1210	   queues, but once the queue has been sustained for a sufficient period
1211	   of time (or reaches a sufficient queue depth) there should be a small
1212	   number of losses to signal to the transport protocol that it should
1213	   reduce its window.  Losses that are too early can prevent the
1214	   transport from averaging at the target_data_rate.  Losses that are
1215	   too late indicate that the queue might be subject to bufferbloat
1216	   [wikiBloat] and inflict excess queuing delays on all flows sharing
1217	   the bottleneck queue.  Excess losses (more than a few per RTT) make
1218	   loss recovery problematic for the transport protocol.  Non-linear or
1219	   erratic RTT fluctuations suggest poor interactions between the
1220	   channel acquisition algorithms and the transport self clock.  All of
1221	   the tests in this section use the same basic scanning algorithm,
1222	   described here, but score the link on the basis of how well it avoids
1223	   each of these problems.

1225	   For some technologies the data might not be subject to increasing
1226	   delays, in which case the data rate will vary with the window size
1227	   all the way up to the onset of load induced losses or ECN marks.  For
1228	   theses technologies, the discussion of queueing does not apply, but
1229	   it is still required that the onset of losses (or ECN marks) be at an
1230	   appropriate point and progressive.

1232	   Use the procedure in Section 6.1.3 to sweep the window across the
1233	   onset of queueing and the onset of loss.  The tests below all assume
1234	   that the scan emulates standard additive increase and delayed ACK by
1235	   incrementing the window by one packet for every 2*target_pipe_size
1236	   packets delivered.  A scan can typically be divided into three
1237	   regions: below the onset of queueing, a standing queue, and at or
1238	   beyond the onset of loss.

1240	   Below the onset of queueing the RTT is typically fairly constant, and
1241	   the data rate varies in proportion to the window size.  Once the data
1242	   rate reaches the link rate, the data rate becomes fairly constant,
1243	   and the RTT increases in proportion to the increase in window size.
1244	   The precise transition across the start of queueing can be identified
1245	   by the maximum network power, defined to be the ratio data rate over
1246	   the RTT.  The network power can be computed at each window size, and
1247	   the window with the maximum are taken as the start of the queueing
1248	   region.

1250	   For technologies that do not have conventional queues, start the scan
1251	   at a window equal to the test_window=target_data_rate*test_RTT/
1252	   target_MTU, i.e. starting at the target rate, instead of the power
1253	   point.

1255	   If there is random background loss (e.g. bit errors, etc), precise
1256	   determination of the onset of queue induced packet loss may require
1257	   multiple scans.  Above the onset of queuing loss, all transport
1258	   protocols are expected to experience periodic losses determined by
1259	   the interaction between the congestion control and AQM algorithms.
1260	   For standard congestion control algorithms the periodic losses are
1261	   likely to be relatively widely spaced and the details are typically
1262	   dominated by the behavior of the transport protocol itself.  For the
1263	   stiffened transport protocols case (with non-standard, aggressive
1264	   congestion control algorithms) the details of periodic losses will be
1265	   dominated by how the the window increase function responds to loss.

1267	7.2.1.  Congestion Avoidance

1269	   A link passes the congestion avoidance standing queue test if more
1270	   than target_run_length packets are delivered between the onset of
1271	   queueing (as determined by the window with the maximum network power)
1272	   and the first loss or ECN mark.  If this test is implemented using a
1273	   standards congestion control algorithm with a clamp, it can be used
1274	   in situ in the production internet as a capacity test.  For an
1275	   example of such a test see [Pathdiag].

1277	   For technologies that do not have conventional queues, use the
1278	   test_window inplace of the onset of queueing. i.e.  A link passes the
1279	   congestion avoidance standing queue test if more than
1280	   target_run_length packets are delivered between start of the scan at
1281	   test_window and the first loss or ECN mark.

1283	7.2.2.  Bufferbloat

1285	   This test confirms that there is some mechanism to limit buffer
1286	   occupancy (e.g. that prevents bufferbloat).  Note that this is not
1287	   strictly a requirement for single stream bulk performance, however if
1288	   there is no mechanism to limit buffer queue occupancy then a single
1289	   stream with sufficient data to deliver is likely to cause the
1290	   problems described in [RFC2309] and [wikiBloat].  This may cause only
1291	   minor symptoms for the dominant flow, but has the potential to make
1292	   the link unusable for other flows and applications.

1294	   Pass if the onset of loss occurs before a standing queue has
1295	   introduced more delay than than twice target_RTT, or other well
1296	   defined and specified limit.  Note that there is not yet a model for
1297	   how much standing queue is acceptable.  The factor of two chosen here
1298	   reflects a rule of thumb.  In conjunction with the previous test,
1299	   this test implies that the first loss should occur at a queueing
1300	   delay which is between one and two times the target_RTT.

1302	   Specified RTT limits that are larger than twice the target_RTT must
1303	   be fully justified in the FSTDS.

1305	7.2.3.  Non excessive loss

1307	   This test confirm that the onset of loss is not excessive.  Pass if
1308	   losses are equal or less than the increase in the cross traffic plus
1309	   the test traffic window increase on the previous RTT.  This could be
1310	   restated as non-decreasing link throughput at the onset of loss,
1311	   which is easy to meet as long as discarding packets in not more
1312	   expensive than delivering them.  (Note when there is a transient drop
1313	   in link throughput, outside of a standing queue test, a link that
1314	   passes other queue tests in this document will have sufficient queue
1315	   space to hold one RTT worth of data).

1317	7.2.4.  Duplex Self Interference

1319	   This engineering test confirms a bound on the interactions between
1320	   the forward data path and the ACK return path.

1322	   Some historical half duplex technologies had the property that each
1323	   direction held the channel until it completely drains its queue.
1324	   When a self clocked transport protocol, such as TCP, has data and
1325	   acks passing in opposite directions through such a link, the behavior
1326	   often reverts to stop-and-wait.  Each additional packet added to the
1327	   window raises the observed RTT by two forward path packet times, once
1328	   as it passes through the data path, and once for the additional delay
1329	   incurred by the ACK waiting on the return path.

1331	   The duplex self interference test fails if the RTT rises by more than
1332	   some fixed bound above the expected queueing time computed from trom
1333	   the excess window divided by the link data rate.

1335	7.3.  Slowstart tests

1337	   These tests mimic slowstart: data is sent at twice the effective
1338	   bottleneck rate to exercise the queue at the dominant bottleneck.

1340	   In general they are deemed inconclusive if the elapsed time to send
1341	   the data burst is not less than half of the time to receive the ACKs.
1342	   (i.e. sending data too fast is ok, but sending it slower than twice
1343	   the actual bottleneck rate as indicated by the ACKs is deemed
1344	   inconclusive).  Space the bursts such that the average data rate is
1345	   equal to the target_data_rate.

1347	7.3.1.  Full Window slowstart test

1349	   This is a capacity test to confirm that slowstart is not likely to
1350	   exit prematurely.  Send slowstart bursts that are target_pipe_size
1351	   total packets.

1353	   Accumulate packet delivery statistics as described in Section 6.2.2
1354	   to score the outcome.  Pass if it is statistically significant that
1355	   the observed interval between losses or ECN marks is larger than the
1356	   target_run_length.  Fail if it is statistically significant that the
1357	   observed interval between losses or ECN marks is smaller than the
1358	   target_run_length.

1360	   Note that these are the same parameters as the Sender Full Window
1361	   burst test, except the burst rate is at slowestart rate, rather than
1362	   sender interface rate.

1364	7.3.2.  Slowstart AQM test

1366	   Do a continuous slowstart (send data continuously at slowstart_rate),
1367	   until the first loss, stop, allow the network to drain and repeat,
1368	   gathering statistics on the last packet delivered before the loss,
1369	   the loss pattern, maximum observed RTT and window size.  Justify the
1370	   results.  There is not currently sufficient theory justifying
1371	   requiring any particular result, however design decisions that affect
1372	   the outcome of this tests also affect how the network balances
1373	   between long and short flows (the "mice and elephants" problem).  The
1374	   queue at the time of the first loss should be at least one half of
1375	   the target_RTT.

1377	   This is an engineering test: It would be best performed on a
1378	   quiescent network or testbed, since cross traffic has the potential
1379	   to change the results.

1381	7.4.  Sender Rate Burst tests

1383	   These tests determine how well the network can deliver bursts sent at
1384	   sender's interface rate.  Note that this test most heavily exercises
1385	   the front path, and is likely to include infrastructure may be out of
1386	   scope for a subscriber ISP.

1388	   Also, there are a several details that are not precisely defined.
1389	   For starters there is not a standard server interface rate. 1 Gb/s
1390	   and 10 Gb/s are very common today, but higher rates will become cost
1391	   effective and can be expected to be dominant some time in the future.

1393	   Current standards permit TCP to send a full window bursts following
1394	   an application pause.  (Congestion Window Validation [RFC2861], is
1395	   not required, but even if was, it does not take effect until an
1396	   application pause is longer than an RTO.)  Since full window bursts
1397	   are consistent with standard behavior, it is desirable that the
1398	   network be able to deliver such bursts, otherwise application pauses
1399	   will cause unwarranted losses.  Note that the AIMD sawtooth requires
1400	   a peak window that is twice target_pipe_size, so the worst case burst
1401	   may be 2*target_pipe_size.

1403	   It is also understood in the application and serving community that
1404	   interface rate bursts have a cost to the network that has to be
1405	   balanced against other costs in the servers themselves.  For example
1406	   TCP Segmentation Offload (TSO) reduces server CPU in exchange for
1407	   larger network bursts, which increase the stress on network buffer
1408	   memory.

1410	   There is not yet theory to unify these costs or to provide a
1411	   framework for trying to optimize global efficiency.  We do not yet
1412	   have a model for how much the network should tolerate server rate
1413	   bursts.  Some bursts must be tolerated by the network, but it is
1414	   probably unreasonable to expect the network to be able to efficiently
1415	   deliver all data as a series of bursts.

1417	   For this reason, this is the only test for which we explicitly
1418	   encourage derating.  A TDS should include a table of pairs of
1419	   derating parameters: what burst size to use as a fraction of the
1420	   target_pipe_size, and how much each burst size is permitted to reduce
1421	   the run length, relative to to the target_run_length.

1423	7.5.  Combined Tests

1425	   Combined tests efficiently confirm multiple network properties in a
1426	   single test, possibly as a side effect of production content
1427	   delivery.  They require less measurement traffic than other testing
1428	   strategies at the cost of conflating diagnostic signatures when they
1429	   fail.  These are by far the most efficient for testing networks that
1430	   are expected to pass all tests.

1432	7.5.1.  Sustained burst test

1434	   The sustained burst test implements a combined worst case version of
1435	   all of the capacity tests above.  In its simplest form send
1436	   target_pipe_size bursts of packets at server interface rate with
1437	   target_RTT headway (burst start to burst start).  Verify that the
1438	   observed delivery statistics meets the target_run_length.  Key
1439	   observations:
1440	   o  The subpath under test is expected to go idle for some fraction of
1441	      the time: (subpath_data_rate-target_rate)/subpath_data_rate.
1442	      Failing to do so indicates a problem with the procedure and an
1443	      inconclusive test result.
1444	   o  The burst sensitivity can be derated by sending smaller bursts
1445	      more frequently.  E.g. send target_pipe_size*derate packet bursts
1446	      every target_RTT*derate.
1447	   o  When not derated this test is more strenuous than the slowstart
1448	      capacity tests.
1449	   o  A link that passes this test is likely to be able to sustain
1450	      higher rates (close to subpath_data_rate) for paths with RTTs
1451	      significantly smaller than the target_RTT.  Offsetting this
1452	      performance underestimation is part of the rationale behind
1453	      permitting derating in general.
1454	   o  This test can be implemented with instrumented TCP [RFC4898],
1455	      using a specialized measurement application at one end [MBMSource]
1456	      and a minimal service at the other end [RFC0863] [RFC0864].  A
1457	      prototype tool exists and is under evaluation .
1458	   o  This test is efficient to implement, since it does not require
1459	      per-packet timers, and can make use of TSO in modern NIC hardware.
1460	   o  This test is not completely sufficient: the standing window
1461	      engineering tests are also needed to ensure that the link is well
1462	      behaved at and beyond the onset of congestion.  Links that exhibit
1463	      punitive behaviors such as sudden high loss under overload may not
1464	      interact well with TCP's self clock.
1465	   o  Assuming the link passes relevant standing window engineering
1466	      tests (particularly that it has a progressive onset of loss at an
1467	      appropriate queue depth) the passing sustained burst test is
1468	      (believed to be) a sufficient verify that the subpath will not
1469	      impair stream at the target performance under all conditions.
1470	      Proving this statement is the subject of ongoing research.

1472	   Note that this test is clearly independent of the subpath RTT, or
1473	   other details of the measurement infrastructure, as long as the
1474	   measurement infrastructure can accurately and reliably deliver the
1475	   required bursts to the subpath under test.

1477	7.5.2.  Streaming Media

1479	   Model Based Metrics can be implemented as a side effect of serving
1480	   any non-throughput maximizing traffic*, such as streaming media, with
1481	   some additional controls and instrumentation in the servers.  The
1482	   essential requirement is that the traffic be constrained such that
1483	   even with arbitrary application pauses, bursts and data rate
1484	   fluctuations, the traffic stays within the envelope defined by the
1485	   individual tests described above.

1487	   If the serving_data_rate is less than or equal to the
1488	   target_data_rate and the serving_RTT (the RTT between the sender and
1489	   client) is less than the target_RTT, this constraint is most easily
1490	   implemented by clamping the transport window size to be no larger
1491	   than:

1493	   serving_window_clamp=target_data_rate*serving_RTT/
1494	   (target_MTU-header_overhead)

1496	   Under the above constraints the serving_window_clamp will limit the
1497	   both the serving data rate and burst sizes to be no larger than the
1498	   procedures in Section 7.1.2 and Section 7.4 or Section 7.5.1.  Since
1499	   the serving RTT is smaller than the target_RTT, the worst case bursts
1500	   that might be generated under these conditions will be smaller than
1501	   called for by Section 7.4 and the sender rate burst sizes are
1502	   implicitly derated by the serving_window_clamp divided by the
1503	   target_pipe_size at the very least.  (The traffic might be smoother
1504	   than specified by the sender interface rate bursts test.)

1506	   Note that it is important that the target_data_rate be above the
1507	   actual average rate needed by the application so it can recover after
1508	   transient pauses caused by congestion or the application itself.

1510	   In an alternative implementation the data rate and bursts might be
1511	   explicitly controlled by a host shaper or pacing at the sender.  This
1512	   would provide better control over transmissions but it is
1513	   substantially more complicated to implement and would be likely to
1514	   have a higher CPU overhead.

1516	   * Note that these techniques can be applied to any content delivery
1517	   that can be subjected to a reduced data rate in order to inhibit TCP
1518	   equilibrium behavior.

1520	8.  An Example

1522	   In this section a we illustrate a TDS designed to confirm that an
1523	   access ISP can reliably deliver HD video from multiple content
1524	   providers to all of their customers.  With modern codecs HD video
1525	   generally fits in 2.5 Mb/s [@@HDvideo].  Due to their geographical
1526	   size, network topology and modem designs the ISP determines that most
1527	   content is within a 50 mS RTT from their users (This is a sufficient
1528	   RTT to cover continental Europe or either US coast from a single
1529	   serving site.)
1530	                        2.5 Mb/s over a 50 ms path

1532	                +----------------------+-------+---------+
1533	                | End to End Parameter | value | units   |
1534	                +----------------------+-------+---------+
1535	                | target_rate          | 2.5   | Mb/s    |
1536	                | target_RTT           | 50    | ms      |
1537	                | target_MTU           | 1500  | bytes   |
1538	                | header_overhead      | 64    | bytes   |
1539	                | target_pipe_size     | 11    | packets |
1540	                | target_run_length    | 363   | packets |
1541	                +----------------------+-------+---------+

1543	                                  Table 1

1545	   Table 1 shows the default TCP model with no derating, and as such is
1546	   quite conservative.  The simplest TDS would be to use the sustained
1547	   burst test, described in Section 7.5.1.  Such a test would send 11
1548	   packet bursts every 50mS, and confirming that there was no more than
1549	   1 packet loss per 33 bursts (363 total packets in 1.650 seconds).

1551	   Since this number represents is the entire end-to-ends loss budget,
1552	   independent subpath tests could be implemented by apportioning the
1553	   loss rate across subpaths.  For example 50% of the losses might be
1554	   allocated to the access or last mile link to the user, 40% to the
1555	   interconnects with other ISPs and 1% to each internal hop (assuming
1556	   no more than 10 internal hops).  Then all of the subpaths can be
1557	   tested independently, and the spatial composition of passing subpaths
1558	   would be expected to be within the end-to-end loss budget.

1560	   Testing interconnects has generally been problematic: conventional
1561	   performance tests run between Measurement Points adjacent to either
1562	   side of the interconnect, are not generally useful.  Unconstrained
1563	   TCP tests, such as netperf tests [@@netperf] are typically overly
1564	   aggressive because the RTT is so small (often less than 1 mS).  These
1565	   tools are likely to report inflated numbers by pushing other traffic
1566	   off of the network.  As a consequence they are useless for predicting
1567	   actual user performance, and may themselves be quite disruptive.
1568	   Model Based Metrics solves this problem.  The same test pattern as
1569	   used on other links can be applied to the interconnect.  For our
1570	   example, when apportioned 40% of the losses, 11 packet bursts sent
1571	   every 50mS should have fewer than one loss per 82 bursts (902
1572	   packets).

1574	9.  Validation

1576	   Since some aspects of the models are likely to be too conservative,
1577	   Section 5.2 permits alternate protocol models and Section 5.3 permits
1578	   test parameter derating.  If either of these techniques are used, we
1579	   require demonstrations that such a TDS can robustly detect links that
1580	   will prevent authentic applications using state-of-the-art protocol
1581	   implementations from meeting the specified performance targets.  This
1582	   correctness criteria is potentially difficult to prove, because it
1583	   implicitly requires validating a TDS against all possible links and
1584	   subpaths.  The procedures described here are still experimental.

1586	   We suggest two approaches, both of which should be applied: first,
1587	   publish a fully open description of the TDS, including what
1588	   assumptions were used and and how it was derived, such that the
1589	   research community can evaluate the design decisions, test them and
1590	   comment on their applicability; and second, demonstrate that an
1591	   applications running over an infinitessimally passing testbed do meet
1592	   the performance targets.

1594	   An infinitessimally passing testbed resembles a epsilon-delta proof
1595	   in calculus.  Construct a test network such that all of the
1596	   individual tests of the TDS pass by only small (infinitesimal)
1597	   margins, and demonstrate that a variety of authentic applications
1598	   running over real TCP implementations (or other protocol as
1599	   appropriate) meets the end-to-end target parameters over such a
1600	   network.  The workloads should include multiple types of streaming
1601	   media and transaction oriented short flows (e.g. synthetic web
1602	   traffic ).

1604	   For example, for the HD streaming video TDS described in Section 8,
1605	   the link layer bottleneck data rate should be exactly the header
1606	   overhead above 2.5 Mb/s, the per packet random background loss
1607	   probability should be 1/363, for a run length of 363 packets, the
1608	   bottleneck queue should be 11 packets and the front path should have
1609	   just enough buffering to withstand 11 packet interface rate bursts.
1610	   We want every one of the TDS tests to fail if we slightly increase
1611	   the relevant test parameter, so for example sending a 12 packet
1612	   bursts should cause excess (possibly deterministic) packet drops at
1613	   the dominant queue at the bottleneck.  On this infinitessimally
1614	   passing network it should be possible for a real application using a
1615	   stock TCP implementation in the vendor's default configuration to
1616	   attain 2.5 Mb/s over an 50 mS path.

1618	   The most difficult part of setting up such a testbed is arranging for
1619	   it to infinitesimally pass the individual tests.  Two approaches:
1620	   constraining the network devices not to use all available resources
1621	   (e.g. by limiting available buffer space or data rate); and
1622	   preloading subpaths with cross traffic.  Note that is it important
1623	   that a single environment be constructed which infinitessimally
1624	   passes all tests at the same time, otherwise there is a chance that
1625	   TCP can exploit extra latitude in some parameters (such as data rate)
1626	   to partially compensate for constraints in other parameters (queue
1627	   space, or viceversa).

1629	   To the extent that a TDS is used to inform public dialog it should be
1630	   fully publicly documented, including the details of the tests, what
1631	   assumptions were used and how it was derived.  All of the details of
1632	   the validation experiment should also be published with sufficient
1633	   detail for the experiments to be replicated by other researchers.
1634	   All components should either be open source of fully described
1635	   proprietary implementations that are available to the research
1636	   community.

1638	10.  Acknowledgements

1640	   Ganga Maguluri suggested the statistical test for measuring loss
1641	   probability in the target run length.  Alex Gilgur for helping with
1642	   the statistics and contributing and alternate model.

1644	   Meredith Whittaker for improving the clarity of the communications.

1646	   This work was inspired by Measurement Lab: open tools running on an
1647	   open platform, using open tools to collect open data.  See
1648	   http://www.measurementlab.net/

1650	11.  Informative References

1652	   [RFC0863]  Postel, J., "Discard Protocol", STD 21, RFC 863, May 1983.

1654	   [RFC0864]  Postel, J., "Character Generator Protocol", STD 22,
1655	              RFC 864, May 1983.

1657	   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
1658	              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
1659	              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
1660	              S., Wroclawski, J., and L. Zhang, "Recommendations on
1661	              Queue Management and Congestion Avoidance in the
1662	              Internet", RFC 2309, April 1998.

1664	   [RFC2330]  Paxson, V., Almes, G., Mahdavi, J., and M. Mathis,
1665	              "Framework for IP Performance Metrics", RFC 2330,
1666	              May 1998.

1668	   [RFC2861]  Handley, M., Padhye, J., and S. Floyd, "TCP Congestion
1669	              Window Validation", RFC 2861, June 2000.

1671	   [RFC3148]  Mathis, M. and M. Allman, "A Framework for Defining
1672	              Empirical Bulk Transfer Capacity Metrics", RFC 3148,
1673	              July 2001.

1675	   [RFC3465]  Allman, M., "TCP Congestion Control with Appropriate Byte
1676	              Counting (ABC)", RFC 3465, February 2003.

1678	   [RFC4015]  Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm
1679	              for TCP", RFC 4015, February 2005.

1681	   [RFC4737]  Morton, A., Ciavattone, L., Ramachandran, G., Shalunov,
1682	              S., and J. Perser, "Packet Reordering Metrics", RFC 4737,
1683	              November 2006.

1685	   [RFC4898]  Mathis, M., Heffner, J., and R. Raghunarayan, "TCP
1686	              Extended Statistics MIB", RFC 4898, May 2007.

1688	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
1689	              Control", RFC 5681, September 2009.

1691	   [RFC5835]  Morton, A. and S. Van den Berghe, "Framework for Metric
1692	              Composition", RFC 5835, April 2010.

1694	   [RFC6049]  Morton, A. and E. Stephan, "Spatial Composition of
1695	              Metrics", RFC 6049, January 2011.

1697	   [RFC6673]  Morton, A., "Round-Trip Packet Loss Metrics", RFC 6673,
1698	              August 2012.

1700	   [I-D.ietf-ippm-2330-update]
1701	              Fabini, J. and A. Morton, "Advanced Stream and Sampling
1702	              Framework for IPPM", draft-ietf-ippm-2330-update-05 (work
1703	              in progress), May 2014.

1705	   [I-D.ietf-ippm-lmap-path]
1706	              Bagnulo, M., Burbridge, T., Crawford, S., Eardley, P., and
1707	              A. Morton, "A Reference Path and Measurement Points for
1708	              LMAP", draft-ietf-ippm-lmap-path-04 (work in progress),
1709	              June 2014.

1711	   [MSMO97]   Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The
1712	              Macroscopic Behavior of the TCP Congestion Avoidance
1713	              Algorithm", Computer Communications Review volume 27,
1714	              number3, July 1997.

1716	   [WPING]    Mathis, M., "Windowed Ping: An IP Level Performance
1717	              Diagnostic", INET 94, June 1994.

1719	   [mpingSource]
1720	              Fan, X., Mathis, M., and D. Hamon, "Git Repository for
1721	              mping: An IP Level Performance Diagnostic", Sept 2013,
1722	              .

1724	   [MBMSource]
1725	              Hamon, D., "Git Repository for Model Based Metrics",
1726	              Sept 2013, .

1728	   [Pathdiag]
1729	              Mathis, M., Heffner, J., O'Neil, P., and P. Siemsen,
1730	              "Pathdiag: Automated TCP Diagnosis", Passive and Active
1731	              Measurement , June 2008.

1733	   [StatQC]   Montgomery, D., "Introduction to Statistical Quality
1734	              Control - 2nd ed.", ISBN 0-471-51988-X, 1990.

1736	   [Rtool]    R Development Core Team, "R: A language and environment
1737	              for statistical computing. R Foundation for Statistical
1738	              Computing, Vienna, Austria. ISBN 3-900051-07-0, URL
1739	              http://www.R-project.org/",  , 2011.

1741	   [CVST]     Krueger, T. and M. Braun, "R package: Fast Cross-
1742	              Validation via Sequential Testing", version 0.1, 11 2012.

1744	   [CUBIC]    Ha, S., Rhee, I., and L. Xu, "CUBIC: a new TCP-friendly
1745	              high-speed TCP variant", SIGOPS Oper. Syst. Rev. 42, 5,
1746	              July 2008.

1748	   [LMCUBIC]  Ledesma Goyzueta, R. and Y. Chen, "A Deterministic Loss
1749	              Model Based Analysis of CUBIC, IEEE International
1750	              Conference on Computing, Networking and Communications
1751	              (ICNC), E-ISBN : 978-1-4673-5286-4", January 2013.

1753	   [AFD]      Pan, R., Breslau, L., Prabhakar, B., and S. Shenker,
1754	              "Approximate fairness through differential dropping",
1755	              SIGCOMM Comput. Commun. Rev.  33, 2, April 2003.

1757	   [wikiBloat]
1758	              Wikipedia, "Bufferbloat", http://en.wikipedia.org/w/
1759	               index.php?title=Bufferbloat&oldid=608805474, June 2014.

1761	   [CCscaling]
1762	              Fernando, F., Doyle, J., and S. Steven, "Scalable laws for
1763	              stable network congestion control", Proceedings of
1764	              Conference on Decision and
1765	              Control, http://www.ee.ucla.edu/~paganini, December 2001.

1767	Appendix A.  Model Derivations

1769	   The reference target_run_length described in Section 5.2 is based on
1770	   very conservative assumptions: that all window above target_pipe_size
1771	   contributes to a standing queue that raises the RTT, and that classic
1772	   Reno congestion control with delayed ACKs are in effect.  In this
1773	   section we provide two alternative calculations using different
1774	   assumptions.

1776	   It may seem out of place to allow such latitude in a measurement
1777	   standard, but the section provides offsetting requirements.

1779	   The estimates provided by these models make the most sense if network
1780	   performance is viewed logarithmically.  In the operational Internet,
1781	   data rates span more than 8 orders of magnitude, RTT spans more than
1782	   3 orders of magnitude, and loss probability spans at least 8 orders
1783	   of magnitude.  When viewed logarithmically (as in decibels), these
1784	   correspond to 80 dB of dynamic range.  On an 80 db scale, a 3 dB
1785	   error is less than 4% of the scale, even though it might represent a
1786	   factor of 2 in untransformed parameter.

1788	   This document gives a lot of latitude for calculating
1789	   target_run_length, however people designing a TDS should consider the
1790	   effect of their choices on the ongoing tussle about the relevance of
1791	   "TCP friendliness" as an appropriate model for Internet capacity
1792	   allocation.  Choosing a target_run_length that is substantially
1793	   smaller than the reference target_run_length specified in Section 5.2
1794	   strengthens the argument that it may be appropriate to abandon "TCP
1795	   friendliness" as the Internet fairness model.  This gives developers
1796	   incentive and permission to develop even more aggressive applications
1797	   and protocols, for example by increasing the number of connections
1798	   that they open concurrently.

1800	A.1.  Queueless Reno

1802	   In Section 5.2 it is assumed that the target rate is the same as the
1803	   link rate, and any excess window causes a standing queue at the
1804	   bottleneck.  This might be representative of a non-shared access
1805	   link.  An alternative situation would be a heavily aggregated subpath
1806	   where individual flows do not significantly contribute to the
1807	   queueing delay, and losses are determined monitoring the average data
1808	   rate, for example by the use of a virtual queue as in [AFD].  In such
1809	   a scheme the RTT is constant and TCP's AIMD congestion control causes
1810	   the data rate to fluctuate in a sawtooth.  If the traffic is being
1811	   controlled in a manner that is consistent with the metrics here, goal
1812	   would be to make the actual average rate equal to the
1813	   target_data_rate.

1815	   We can derive a model for Reno TCP and delayed ACK under the above
1816	   set of assumptions: for some value of Wmin, the window will sweep
1817	   from Wmin packets to 2*Wmin packets in 2*Wmin RTT.  Unlike the
1818	   queueing case where Wmin = Target_pipe_size, we want the average of
1819	   Wmin and 2*Wmin to be the target_pipe_size, so the average rate is
1820	   the target rate.  Thus we want Wmin = (2/3)*target_pipe_size.

1822	   Between losses each sawtooth delivers (1/2)(Wmin+2*Wmin)(2Wmin)
1823	   packets in 2*Wmin round trip times.

1825	   Substituting these together we get:

1827	   target_run_length = (4/3)(target_pipe_size^2)

1829	   Note that this is 44% of the reference run length.  This makes sense
1830	   because under the assumptions in Section 5.2 the AMID sawtooth caused
1831	   a queue at the bottleneck, which raised the effective RTT by 50%.

1833	A.2.  CUBIC

1835	   CUBIC has three operating regions.  The model for the expected value
1836	   of window size derived in [LMCUBIC] assumes operation in the
1837	   "concave" region only, which is a non-TCP friendly region for long-
1838	   lived flows.  The authors make the following assumptions: packet loss
1839	   probability, p, is independent and periodic, losses occur one at a
1840	   time, and they are true losses due to tail drop or corruption.  This
1841	   definition of p aligns very well with our definition of
1842	   target_run_length and the requirement for progressive loss (AQM).

1844	   Although CUBIC window increase depends on continuous time, the
1845	   authors transform the time to reach the maximum Window size in terms
1846	   of RTT and a parameter for the multiplicative rate decrease on
1847	   observing loss, beta (whose default value is 0.2 in CUBIC).  The
1848	   expected value of Window size, E[W], is also dependent on C, a
1849	   parameter of CUBIC that determines its window-growth aggressiveness
1850	   (values from 0.01 to 4).

1852	   E[W] = ( C*(RTT/p)^3 * ((4-beta)/beta) )^-4

1854	   and, further assuming Poisson arrival, the mean throughput, x, is

1856	   x = E[W]/RTT

1858	   We note that under these conditions (deterministic single losses),
1859	   the value of E[W] is always greater than 0.8 of the maximum window
1860	   size ~= reference_run_length. @@@@

1862	Appendix B.  Complex Queueing

1864	   For many network technologies simple queueing models do not apply:
1865	   the network schedules, thins or otherwise alters the timing of ACKs
1866	   and data, generally to raise the efficiency of the channel allocation
1867	   process when confronted with relatively widely spaced small ACKs.
1868	   These efficiency strategies are ubiquitous for half duplex, wireless
1869	   and broadcast media.

1871	   Altering the ACK stream generally has two consequences: it raises the
1872	   effective bottleneck data rate, making slowstart burst at higher
1873	   rates (possibly as high as the sender's interface rate) and it
1874	   effectively raises the RTT by the average time that the ACKs were
1875	   delayed.  The first effect can be partially mitigated by reclocking
1876	   ACKs once they are beyond the bottleneck on the return path to the
1877	   sender, however this further raises the effective RTT.

1879	   The most extreme example of this sort of behavior would be a half
1880	   duplex channel that is not released as long as end point currently
1881	   holding the channel has queued traffic.  Such environments cause self
1882	   clocked protocols under full load to revert to extremely inefficient
1883	   stop and wait behavior, where they send an entire window of data as a
1884	   single burst, followed by the entire window of ACKs on the return
1885	   path.

1887	   If a particular end-to-end path contains a link or device that alters
1888	   the ACK stream, then the entire path from the sender up to the
1889	   bottleneck must be tested at the burst parameters implied by the ACK
1890	   scheduling algorithm.  The most important parameter is the Effective
1891	   Bottleneck Data Rate, which is the average rate at which the ACKs
1892	   advance snd.una.  Note that thinning the ACKs (relying on the
1893	   cumulative nature of seg.ack to permit discarding some ACKs) is
1894	   implies an effectively infinite bottleneck data rate.  It is
1895	   important to note that due to the self clock, ill conceived channel
1896	   allocation mechanisms can increase the stress on upstream links in a
1897	   long path.

1899	   Holding data or ACKs for channel allocation or other reasons (such as
1900	   error correction) always raises the effective RTT relative to the
1901	   minimum delay for the path.  Therefore it may be necessary to replace
1902	   target_RTT in the calculation in Section 5.2 by an effective_RTT,
1903	   which includes the target_RTT reflecting the fixed part of the path
1904	   plus a term to account for the extra delays introduced by these
1905	   mechanisms.

1907	Appendix C.  Version Control

1909	   Formatted: Thu Jul 3 20:19:04 PDT 2014

1911	Authors' Addresses

1913	   Matt Mathis
1914	   Google, Inc
1915	   1600 Amphitheater Parkway
1916	   Mountain View, California  94043
1917	   USA

1919	   Email: mattmathis@google.com

1921	   Al Morton
1922	   AT&T Labs
1923	   200 Laurel Avenue South
1924	   Middletown, NJ  07748
1925	   USA

1927	   Phone: +1 732 420 1571
1928	   Email: acmorton@att.com
1929	   URI:   http://home.comcast.net/~acmacm/