idnits 2.17.1 

draft-ietf-ippm-btc-framework-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-25) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 576 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Abstract section.

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 5 instances of too long lines in the document, the longest one
     being 5 characters in excess of 72.

  ** There are 5 instances of lines with control characters in the document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC98' is mentioned on line 108, but not defined

  == Unused Reference: 'Flo95' is defined on line 474, but no explicit
     reference was found in the text

  == Unused Reference: 'RF98' is defined on line 477, but no explicit
     reference was found in the text

  == Unused Reference: 'Hoe95' is defined on line 485, but no explicit
     reference was found in the text

  == Unused Reference: 'OKM96b' is defined on line 511, but no explicit
     reference was found in the text

  == Unused Reference: 'Pax97a' is defined on line 516, but no explicit
     reference was found in the text

  == Unused Reference: 'Pax97c' is defined on line 522, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2001' is defined on line 537, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2018' is defined on line 545, but no explicit
     reference was found in the text

  == Unused Reference: 'Ste94' is defined on line 553, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'LowWindow'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FF96'

  == Outdated reference: A later version (-02) exists of
     draft-ietf-tcpimpl-newreno-00

  ** Downref: Normative reference to an Experimental draft:
     draft-ietf-tcpimpl-newreno (ref. 'FH98')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo95'

  ** Downref: Normative reference to an Experimental draft: draft-kksjf-ecn
     (ref. 'RF98')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Hoe96'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Hoe95'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jac88'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Lak94'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MM96a'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MM96b'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MSMO97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'OKM96a'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'OKM96b'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Pax97a'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Pax97b'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Pax97c'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PFTK98'

  ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323)

  ** Obsolete normative reference: RFC 2001 (Obsoleted by RFC 2581)

  ** Downref: Normative reference to an Informational RFC: RFC 2330

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ste94'


     Summary: 17 errors (**), 0 flaws (~~), 13 warnings (==), 20 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	  INTERNET-DRAFT          Expires June 1999            INTERNET-DRAFT

3	  Network Working Group                                   Matt Mathis
4	  INTERNET-DRAFT                     Pittsburgh Supercomputing Center
5	  Expiration Date: June 1999                              Mark Allman
6	                                                           NASA Lewis

8	                  Empirical Bulk Transfer Capacity

10	              < draft-ietf-ippm-btc-framework-00.txt >

12	  Status of this Document

14	   This document is an Internet-Draft.  Internet-Drafts are working
15	   documents of the Internet Engineering Task Force (IETF), its areas,
16	   and its working groups.  Note that other groups may also distribute
17	   working documents as Internet-Drafts.

19	   Internet-Drafts are draft documents valid for a maximum of six
20	   months, and may be updated, replaced, or obsoleted by other documents
21	   at any time.  It is inappropriate to use Internet-Drafts as reference
22	   material or to cite them other than as "work in progress."

24	   To view the entire list of current Internet-Drafts, please check the
25	   "1id-abstracts.txt" listing contained in the Internet-Drafts shadow
26	   directories on ftp.is.co.za (Africa), nic.nordu.net (Northern
27	   Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific
28	   Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).

30	   This memo provides information for the Internet community.  This memo
31	   does not specify an Internet standard of any kind.  Distribution of
32	   this memo is unlimited.

34	  Abstract:

36	    Bulk Transport Capacity (BTC) is a measure of a network's ability
37	    to transfer significant quantities of data with a single
38	    congestion-aware transport connection (e.g., TCP).  The intuitive
39	    definition of BTC is the expected long term average data rate
40	    (bits per second) of a single ideal TCP implementation over the
41	    path in question.  However, there are many congestion control
42	    algorithms (and hence transport implementations) permitted by
43	    IETF standards.  This diversity in transport algorithms creates a
44	    difficulty for standardizing BTC metrics because the allowed
45	    diversity is sufficient to lead to situations where different
46	    implementations will yield non-comparable measures -- and
47	    potentially fail the formal tests for being a metric.

49	    This document defines a framework for standardizing multiple BTC
50	    metrics that parallel the permitted transport diversity.  Two
51	    approaches are used.  First, each BTC metric must be much more
52	    tightly specified than the typical IETF protocol.  Pseudo-code or
53	    reference implementations are expected to be the norm.  Second,
54	    each BTC methodology is expected to collect some ancillary metrics
55	    which are potentially useful to support analytical models of BTC.

57	1.  Introduction

59	    Bulk Transport Capacity (BTC) is a measure of a network's ability
60	    to transfer significant quantities of data with a single
61	    congestion-aware transport connection (e.g., TCP).  For many
62	    applications the BTC of the underlying network dominates the
63	    overall elapsed time for the application and thus dominates the
64	    performance as perceived by a user.  Examples of such
65	    applications include FTP, and the world wide web when delivering
66	    large images or documents.

68	    The intuitive definition of BTC is the expected long term average
69	    data rate (bits per second) of a single ideal TCP implementation
70	    over the path in question.

72	    Central to the notion of bulk transport capacity is the idea that
73	    all transport protocols should have similar responses to
74	    congestion in the Internet.  Indeed the only form of equity
75	    significantly deployed in the Internet today is that the vast
76	    majority of all traffic is carried by TCP implementations sharing
77	    common congestion control algorithms largely due to a shared
78	    developmental heritage.

80	    [RFC2001.bis] specifies the standard congestion control algorithms
81	    used by these TCP implementations.  Even though this document is a
82	    (proposed) standard, it permits considerable latitude in
83	    implementation.  This latitude is by design, to encourage ongoing
84	    evolution in congestion control algorithms.

86	    This legal diversity in transport algorithms creates a
87	    difficulty for standardizing BTC metrics because the allowed
88	    diversity is sufficient to lead to situations where different
89	    implementations will yield non-comparable measures -- and
90	    potentially fail the formal tests for being a metric.

92	@@@ A more serious problem is that most of the existing CC algorithms
93	@ do not assure that improving the properties of a path improves the
94	@ measure of that path.   That is existing TCP implementations do not
95	@ always have performance that monotonically increase with true path
96	@ capacity.
97	#
98	# OK.  I'll leave that to you...  I think it needs said and supported
99	# with some explanation.  --allman
100	@ Next pass --MM--

102	    Furthermore congestion control and related areas, including
103	    Integrated services[@@], differentiated services[@@] and Internet
104	    traffic analysis[@@] are all currently receiving a lot of
105	    attention from the research community.  It is very likely that we
106	    will see new experimental congestion control algorithms in the near
107	    future.  In addition, explicit congestion notification (ECN)
108	    [RFC98] is being tested for Internet deployment.  We do not yet
109	    know how any of these developments might affect BTC metrics.

111	    This document defines a framework for standardizing multiple BTC
112	    metrics that parallel the permitted transport diversity.  Two
113	    approaches are used.  First, each BTC metric must be much more
114	    tightly specified than the typical IETF protocol.  Pseudo-code or
115	    reference implementations are expected to be the norm.  Second,
116	    each BTC methodology is expected to collect some ancillary metrics
117	    which are potentially useful to support analytical models of BTC.

119	    For example, the models in [PFTK98, MSMO97, OKM96a, Lak94] all
120	    predict bulk performance based on path properties such as loss
121	    rate, round trip time, etc.  A BTC methodology which also provides
122	    ancillary measures of these properties is stronger because
123	    agreement with the analytical models can be used to corroborate
124	    the direct BTC measurement results.

126	    More importantly these ancillary metrics are expected to be useful
127	    for resolving disparity between different BTC metrics.  For
128	    example, a path that predominantly experiences clustered packet
129	    losses is likely to exhibit vastly different measures from BTC
130	    metrics that mimic Tahoe, Reno, NewReno, and SACK TCP
131	    algorithms [FF96].  The differences in the BTC metrics over
132	    such a path might be diagnosed by an ancillary measure of loss
133	    clustering.

135	    Furthermore there are some path properties which are best measured
136	    as ancillary metrics to a transport protocol.  Examples of such
137	    properties include bottleneck queue limits or the tendency to
138	    reorder packets.  These are difficult or impossible to measure at
139	    low rates and unsafe to measure at rates higher than the bulk
140	    transport capacity of the path.

142	    It is expected that at some point in the future there will exist
143	    an A-frame [RFC2330] which will unify all simple path metrics
144	    (e.g., segment loss rates, round trip time) and BTC ancillary
145	    metrics (e.g. queue size and packet reordering) with different
146	    versions of BTC metrics (e.g., that parallel Reno or SACK TCP).

148	2.  Congestion Control Algorithms

150	    Nearly all TCP implementations in use today are based on
151	    congestion control algorithms published in [Jac88] and further
152	    refined in [RFC2001,RFC2001.bis].  In addition to the basic notion
153	    of using an ACK clock, TCP (and therefore BTC) implements five
154	    standard congestion control algorithms: Congestion Avoidance,
155	    Retransmission timeouts, Slow-start, Fast Retransmit and Fast
156	    Recovery.  All BTC implementations must use these algorithms as
157	    they are defined in [RFC2001.bis].  However, in all cases a BTC
158	    metric must more tightly specify these algorithms, as discussed
159	    below.

161	2.1 Congestion Avoidance

163	    The Congestion Avoidance algorithm drives the steady-state bulk
164	    transfer behavior of TCP.  It calls for opening the congestion
165	    window (cwnd) by a constant additive amount during each round trip
166	    time (RTT), and closing it by a constant multiplicative fraction
167	    on congestion, as indicated by lost segments.  The window closing
168	    is specified to be half the number of outstanding data segments in
169	    flight when loss is detected.  A BTC metric must specify the
170	    following Congestion Avoidance details:

172	        The exact algorithm for incrementing cwnd is left to the
173	        implementer.  Several candidate algorithms are outlined in
174	        [RFC2001.bis].  In addition, some of these algorithms include some
175	        rounding.  For these reasons, the exact algorithm for increasing
176	        cwnd during congestion avoidance must be fully specified for
177	        each BTC metric defined.

179	        [RFC2001.bis] permits an extra plus one segment window
180	        adjustment following the multiplicative closing of cwnd.  This
181	        is because [RFC2001.bis] allows a single invocation of the Slow-Start
182	        algorithm when when cwnd equals ssthresh at the end of
183	        recovery.

185	2.2 Retransmission Timeouts

187	    In order to provide reliable data delivery, TCP resends a segment if
188	    the ACK for the given segment does not arrive before the
189	    retransmission timer (RTO) fires.  A BTC metric must implement an
190	    RTO timer to trigger retransmissions not handled by the fast
191	    retransmit algorithm.  Such retransmissions can have a large impact
192	    on the measured capacity.  Calculating the RTO is subject to a
193	    number of details that are not standardized.  When implementing a
194	    BTC metric the details of the RTO calculation, how and when the
195	    clock is set, as well as the clock granularity must be fully
196	    documented.

198	2.3 Slow Start

200	    Slow start is part of TCP's transient behavior.  It is used to
201	    quickly bring new or recently restarted connections up to an
202	    appropriate congestion window.  In addition, slow start is used to
203	    restart the ACK clock after a retransmission timeout.  A BTC
204	    implementation must use the slow start algorithm, as specified by
205	    [RFC2001.bis].  The slow start algorithm is used while the congestion
206	    window (cwnd) is less than the slow start threshold (ssthresh).
207	    However, whether to use slow start or congestion avoidance when cwnd
208	    equals ssthresh is left to the implementer by [RFC2001.bis].  This
209	    detail must be specified in every specific BTC metric definition.

211	2.4 Fast Retransmit/Fast Recovery

213	    The Fast Retransmit/Fast Recovery algorithms are used to infer
214	    segment loss before the RTO expires.  A BTC implementation must
215	    implement the algorithms as defined in [RFC2001.bis].

217	    In Reno TCP, Fast Retransmit and Fast Recovery are used to support
218	    the Congestion Avoidance algorithm during recovery from lost
219	    segments.  During Fast Recovery, the data receiver sends duplicated
220	    acknowledgments.  The data sender uses these duplicate ACKs to
221	    detect loss, to estimate the quantity of data in the network still
222	    pending delivery and to clock out new data in an effort to keep the
223	    ACK clock running.

225	2.5 Advanced Recovery Algorithms

227	    It has been observed that under some conditions the Fast
228	    Retransmit and Fast Recovery algorithms do not reliably preserve
229	    TCP's Self-Clock, causing unpredictable or unstable TCP
230	    performance [Lak94@@@check, Flo95].  Simulations of reference TCP
231	    implementations have uncovered situations where incidental changes
232	    in other parts of the network have a large effect on performance
233	    [MM96a].  Other simulations have shown that under some
234	    conditions, slightly better networks (higher bandwidth, lower
235	    delay or less load from other connections) yield lower throughput.
236	@@@ This is pretty easy to construct, but has it been published?
237	# Not that I can think of off the top of my head...  Maybe a concrete
238	# example to back up the claim?  --allman

240	    [RFC2001.bis] allows a TCP implementation to use more robust loss
241	    recovery algorithms, such as NewReno type algorithms
242	    [FH98,FF96,Hoe96] and SACK-based algorithms [FF96,MM96a,MM96b].
243	    While allowing these algorithms, [RFC2001.bis] does not define any
244	    such algorithm and therefore, a BTC metric that implements
245	    advanced recovery algorithms must fully specify the details.

247	    Note that since TCP based on standard Fast Retransmit and Fast
248	    Recovery sometimes exhibits erratic performance [MM96a], these
249	    algorithms may prove to be unsuitable for use in a metric.
250	# Ouch...  I know what you're saying, but...  If the goal is to see what
251	# congestion-aware transport connection yields, I think the above is a
252	# little harsh given the current standardized CC algorithms.

254	2.6 Segment Size

256	    The actual segment size, or method of choosing a segment size
257	    (e.g., path MTU discovery [RFC1191]) and the number of header
258	    bytes assumed to be prepended to each segment must be specified.
259	    In addition if the segment size is artificially limited to less
260	    than the path MTU this must be indicated.

262	3 Ancillary Metrics

264	    The following ancillary metrics should be implemented in every BTC
265	    that can exhibit the relevant behaviors.  Alternatively, the BTC
266	    implementation should provide enough information that the following
267	    information can be gathered in post-processing (e.g., by providing a
268	    segment trace of the connection).

270	3.1 Congestion Avoidance Capacity

272	    Define a pure "Congestion Avoidance Capacity" (CAC) metric to be
273	    the data rate (bits per second) of a fully specified
274	    implementation of the Congestion Avoidance algorithm, subject to
275	    the restriction that the Retransmission Timeout and Slow-Start
276	    algorithms are not invoked.  The CAC metric is defined to have no
277	    meaning across Retransmission Timeouts or Slow-Start (except the
278	    single segment Slow-Start that is permitted to follow recovery).

280	    In principle a CAC metric would be an ideal BTC metric.  But there
281	    is a rather substantial difficulty with using it as such.  The
282	    Self-Clocking of the Congestion Avoidance algorithm can be very
283	    fragile, depending on the specific details of the Fast Retransmit,
284	    Fast Recovery or advanced recovery algorithms above.

286	    When TCP looses Self-Clock it is reestablished through a
287	    retransmission timeout and Slow-Start.   These algorithms nearly
288	    always take more time than Congestion Avoidance would have taken.

290	    It is easily observed that unless the network loses an entire
291	    window of data (which would clearly require a retransmit timeout)
292	    TCP missed some opportunity to send data.  That is, if TCP
293	    experiences a timeout after losing any partial window of data, it
294	    must have received at least one ACK that was generated after some
295	    of the partial data was delivered, but did not trigger
296	    transmitting any new data.  Much recent research in congestion
297	    control (e.g., FACK[MM96a], NewReno[FH98], [LowWindow]) can be
298	    characterized as making TCP's Self-Clock more tenacious, while
299	    preserving fairness under adverse conditions.  This work is often
300	    motivated by how poorly current TCP implementations perform under
301	    some conditions, often due to repeated clock loss.  Since this is
302	    an active research area, different TCP implementations have rather
303	    considerable differences in their ability to preserve Self-Clock.

305	3.2 Ancillary metrics relating to the preservation of Self-Clock

307	    Since loosing the clock can have a large effect on the overall BTC,
308	    and the clock is itself fragile in ways that are very dependent on
309	    the recovery algorithm, it is important that the transitions between
310	    timer driven and Self-Clocked operation be instrumented.

312	3.2.1 Lost transmission opportunities

314	    If the last event before a timeout was the receipt of an ACK that
315	    did not trigger a retransmission, the possibility exists that
316	    some other congestion control algorithm would have successfully
317	    preserved the Self-Clock.  In this event, instrumenting key parts
318	    of the BTC state (e.g., cwnd) may lead to further improvements in
319	    congestion control algorithms.

321	    Note that in the absence of knowledge about the future, it is not
322	    possible to design an algorithm that never misses transmission
323	    opportunities.  However, there are ever more subtle ways to gauge
324	    network state, and to estimate if a given ACK is likely to be the
325	    last.

327	3.2.2 Loosing an entire window

329	    If an entire window of data (or ACKs) is lost, there will be no
330	    returning ACKs to clock out additional data.  This condition can
331	    be detected if the last event before a timeout was a data
332	    transmission triggered by an ACK.  The loss of an entire window
333	    of data/ACKs forces recovery to be via a Retransmission Timeout and
334	    Slow-Start.

336	    Losing an entire window of data implies an outage with a duration
337	    at least as long as a round trip time.  Such an outage can not be
338	    diagnosed with low rate metrics and is unsafe to diagnose at
339	    higher rates than the BTC.  Therefore all BTC metrics at should
340	    instrument and report losses of an entire window of data.

342	    There are some conditions, such as at very small window, in which
343	    there is a significant probability that an entire window can be
344	    legitimately lost through individual random losses.

346	3.2.3 Heroic clock preservation

348	    All algorithms that permit a given BTC to sustain Self-Clock when
349	    other algorithms might not, should be instrumented.  Furthermore,
350	    the details of the algorithms used must be fully documented.

352	        BTC metrics that can sustain Self-Clock in the presence of
353	        multiple losses within one round trip should instrument the
354	        loss distribution, such that the performance of Reno style
355	        bulk transport can be estimated.

357	        BTC algorithms that can trigger fast retransmits earlier than
358	        following three duplicate acknowledgments (e.g. at small
359	        window [LowWindow]), should instrument and fully document
360	        these events as well.

362	3.2.4  False timeouts

364	    All false timeouts, (where the transmission timer expires before
365	    the ACK for some previously transmitted data arrives) should be
366	    instrumented when possible.  Note that depending upon how the BTC
367	    metric implements sequence numbers, this may be difficult to
368	    detect.

370	3.3 Ancillary metrics relating to flow based path properties

372	    All BTC metrics provide unique vantage points for instrumenting
373	    certain path properties relating to closely spaced packets.  As in
374	    the case of RTT duration outages, these can be impossible to
375	    diagnose at low rates (less than 1 packet per RTT) and
376	    inappropriate to test at rates above the BTC.

378	    All BTC metrics should instrument packet reordering.  The severity
379	    of the reordering can be classified as one of three different
380	    cases, each of which should be instrumented.

382	        Packets that are only slightly out of order should not trigger
383	        retransmission, but they may affect the window calculation.
384	        BTC metrics must document how slightly out-of-order packets
385	        affect the congestion window calculation.  The frequency and
386	        distance out of sequence must be instrumented for all
387	        out-of-order packets.

389	        If packets are sufficiently out-of-order, the Fast Retransmit
390	        algorithm will be invoked in advance of the delayed packet's
391	        late arrival.  These events must be instrumented.
392		Even though the the late arriving packet will complete
393	        recovery, the the window must still be reduced by half.

395		Under some rare conditions packets have been observed that are
396		far out of order - sometimes many seconds late [Pax97b].
397	        These should always be instrumented.

399	    The BTC should instrument the maximum cwnd observed during
400	    congestion avoidance and slow start.  A TCP running over the same
401	    path must have sufficient sender buffer space and receiver window
402	    (and window shift [RFC1323]) to cover this cwnd.

404	    There are several other path properties that one might measure
405	    within a BTC metric.  For example, with an embedded one-way delay
406	    metric it may be possible to measure how queueing delay and
407	    and (RED) drop probabilities are correlated to window size.
408	    These are all open research questions.

410	3.4 Ancillary metrics pertaining to MTU discovery

412	    Under some conditions, BTC can be very sensitive to segment size.
413	    In addition to instrumenting the segment size, a BTC metric should
414	    indicate how it was selected: by path MTU discovery [RFC1191], a
415	    manual control, system default, or the maximum MTU for the
416	    interface.

418	    Note that the most popular LAN technologies have smaller MTUs
419	    than nearly all WAN technologies.  As a consequence, it is
420	    difficult to measure the true performance of a wide area path
421	    without subjecting it to the smaller MTU of the LAN.

423	3.4 Ancillary metrics as calibration checks

425	    Unlike low rate metrics, BTC must have explicit checks that the
426	    test platform is not the bottleneck, either due to insufficient
427	    tester data rate or buffer space.

429	    Ideally all queues within the tester should be instrumented.  All
430	    packets dropped within the tester should be instrumented as tester
431	    failures, invalidating a measurement.

433	    The maximum queue lengths should be instrumented.  Any significant
434	    queue may indicate that the tester itself has insufficient burst
435	    data rate, and is slightly smoothing the data into the network.

437	3.4.3  Validate Reverse path load

439	    @@@@ What happens to a BTC when the reverse path is congested?  Is
440	    this identical to TCP?  What should happen?  How should it be
441	    instrumented?
442	#
443	# Some implementations (mine!) have an annoying feature whereby ACK loss
444	# looks just like data loss.  This should be documented.  If ACK loss
445	# and data loss can be detected separately, I think ACK loss rate should
446	# be reported, as it slightly changes the ACK clock (can impact
447	# algorithms like slow start that work on a per ACK basis and can make
448	# the sender more bursty, which could cause more loss).
449	@ and mine --MM--

451	3.5 Ancillary metrics relating to the need for advanced TCP features

453	    If TCP would require RFC1323 features (window scaling, timestamp
454	    based round trip time measurement, protection from wrapped
455	    sequences, etc) to match the BTC performance, it should be
456	    reported.

458	4 Acknowledgments

460		Jeff Semke, for numerous clarifications.

462	5  References

464	    [LowWindow]  @@@@@ Current work

466	    [FF96] Fall, K., Floyd, S..  "Simulation-based Comparisons of Tahoe,
467	       Reno and SACK TCP".  Computer Communication Review, July 1996.
468	       ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z.

470	    [FH98] Floyd, S., Henderson, T., "The NewReno Modification to
471	       TCP's Fast Recovery Algorithm", Work in progress
472	       draft-ietf-tcpimpl-newreno-00.txt

474	    [Flo95] Floyd, S., "TCP and successive fast retransmits",
475	       March 1995, Obtain via ftp://ftp.ee.lbl.gov/papers/fastretrans.ps.

477	    [RF98] K. Ramakrishnan, S. Floyd, "A Proposal to add Explicit
478	       Congestion Notification (ECN) to IP", Work in progress
479	       draft-kksjf-ecn-03.txt

481	    [Hoe96] Hoe, J., "Improving the start-up behavior of a congestion
482	       control scheme for TCP, Proceedings of ACM SIGCOMM '96,
483	       August 1996.

485	    [Hoe95] Hoe, J., "Startup dynamics of TCP's congestion control
486	       and avoidance schemes".  Master's thesis, Massachusetts Institute
487	       of Technology, June 1995.

489	    [Jac88] Jacobson, V., "Congestion Avoidance and Control",
490	       Proceedings of SIGCOMM '88, Stanford, CA., August 1988.

492	    [Lak94] Lakshman, Effects of random loss

494	    [MM96a] Mathis, M. and Mahdavi, J. "Forward acknowledgment:
495	       Refining TCP congestion control",  Proceedings of ACM SIGCOMM '96,
496	       Stanford, CA., August 1996.

498	    [MM96b] M. Mathis, J. Mahdavi, "TCP Rate-Halving with Bounding
499	       Parameters" Available from
500	       http://www.psc.edu/networking/papers/FACKnotes/current.

502	    [MSMO97] Mathis, M., Semke, J., Mahdavi, J., Ott, T.,
503	       "The Macroscopic Behavior of the TCP Congestion Avoidance
504	       Algorithm", Computer Communications Review, 27(3), July 1997.

506	    [OKM96a], Ott, T., Kemperman, J., Mathis, M., "The Stationary
507	       Behavior of Ideal TCP Congestion Avoidance", In progress, August
508	       1996. Obtain via pub/tjo/TCPwindow.ps using anonymous ftp to
509	       ftp.bellcore.com

511	    [OKM96b], Ott, T., Kemperman, J., Mathis, M., "Window Size
512	       Behavior in TCP/IP with Constant Loss Probability", DIMACS
513	       Special Year on Networks, Workshop on Performance of Real-Time
514	       Applications on the Internet, Nov 1996.

516	    [Pax97a] Paxson, V., "Automated Packet Trace Analysis of TCP
517	       Implementations", Proceedings of ACM SIGCOMM '97, August 1997.

519	    [Pax97b] Paxson, V., "End-to-End Internet Packet Dynamics,"
520	       Proceedings of SIGCOMM '97, Cannes, France, Sep. 1997.

522	    [Pax97c] Paxson, V, editor "Known TCP Implementation Problems",
523	       Work in progress: http://reality.sgi.com/sca/tcp-impl/prob-01.txt

525	    [PFTK98] Padhye, J., Firoiu. V., Towsley, D., and Kurose, J., "TCP
526	       Throughput: A Simple Model and its Empirical Validation",
527	       Proceedings of ACM SIGCOMM '98, August 1998.

529	    [RFC1191] Mogul, J., Deering, S., "Path MTU Discovery",
530	       November 1990, Obtain via:
531	       ftp://ds.internic.net/rfc/rfc1191.txt

533	    [RFC1323] Jacobson, V., Braden, R., Borman, D., "TCP Extensions
534	       for High Performance", May 1992, Obtain via:
535		ftp://ds.internic.net/rfc/rfc1323.txt

537	    [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance,
538	       Fast Retransmit, and Fast Recovery Algorithms",
539	       ftp://ds.internic.net/rfc/rfc2001.txt

541	    [RFC2001.bis] Allman, M., Paxson, V., Stevens, W., "TCP Congestion
542	       Control". Work in progress draft-ietf-cong-control-01.txt, to
543	       update RFC2001.

545	    [RFC2018] Mathis, M., Mahdavi, J. Floyd, S., Romanow, A., "TCP
546	       Selective Acknowledgment Options", 1996, Obtain via:
547	       ftp://ds.internic.net/rfc/rfc2018.txt

549	    [RFC2330] Paxson, V., Almes, G., Mahdavi, J., Mathis, M.,
550	       "Framework for IP Performance Metrics" , 1998, Obtain via:
551	       ftp://ds.internic.net/rfc/rfc2330.txt

553	    [Ste94] Stevens, W., "TCP/IP Illustrated, Volume 1: The
554	       Protocols", Addison-Wesley, 1994.

556	  Author's Addresses

558	    Matt Mathis
559	    Pittsburgh Supercomputing Center
560	    4400 Fifth Ave.
561	    Pittsburgh PA 15213
562	    mathis@psc.edu
563	    http://www.psc.edu/~mathis

565	    Mark Allman
566	    NASA Lewis Research Center/Sterling Software
567	    21000 Brookpark Rd.  MS 54-2
568	    Cleveland, OH  44135
569	    216-433-6586
570	    mallman@lerc.nasa.gov
571	    http://gigahertz.lerc.nasa.gov/~mallman