[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ippm] Testing TCP Throughput Capacity in Operator Networks



Your procedure makes perfect sense for bench testing gear in a
laboratory, where you have complete control over all of the traffic.

NPAD's algorithm is essentially the same as your steps 1&2.  (It uses
LimCwnd in the web100 prototype of 4898 to scan window sizes with one
TCP connection).  I suspect that NPAD's queue measurement algorithm
captures most of the effects detected in your step 3, although not all
of them (For example NPAD would not capture effects due to loss
synchronization between flows).

IPPM has traditionally been focused on in situ testing of Internet
infrastructure, which also includes the effects of uncontrolled cross
traffic.  Please excuse me it this changed while I wasn't looking.
BMWG was focused on Bench Testing, as the name implies.  While it is
true that many metrics might be the same both both arenas, this is not
at all true for BTC.

I completely agree that a good BTC bench test is needed.  Your
proposal is a very good first step in this direction.

However, the world also desperately needs a way to write TCP
performance SLAs, which was my whole reason for starting IPPM some 14
years ago.  (Do people remember that it was called IP Provider Metrics
at one time?)  We knew we didn't have a clue how to do BTC, so we
started with easy metrics like availability, loss and reordering.
Remember, at that time we started, the basic TCP performance model
would not be published for another couple of years.

I think it is time to attempt to do a BTC for in situ Internet
infrastructure, which is designed to support BTC SLAs.   (BTW the
algorithm I have in mind is not suitable for bench testing, so there
is no overlap.)

Thanks,
--MM--
-------------------------------------------
Matt Mathis      http://staff.psc.edu/mathis
Work:412.268.3319   Home/Cell:412.654.7529
-------------------------------------------
Evil is defined by mortals who think they know
"The Truth" and use force to apply it to others.


On Thu, Oct 15, 2009 at 5:45 PM, Barry Constantine
<Barry.Constantine at jdsu.com> wrote:
> Hi Matt,
>
> I understand your concerns with relation to TCP and you undoubtedly have
> vast experience in detailed TCP implementations and studies.  I have
> read several of your papers and they are all excellent.
>
> Let me try to clearly lay out the problem statement and then determine
> if it is worthy of a -00 draft.  I tried to keep this email concise,
> sorry if it is a little long-winded.
>
> In my work with network providers and network equipment manufacturers
> (NEMs), there is a growing realization that RFC-2544 packet level
> testing on today's networks is not adequate.  This community desires to
> measure network throughput performance at the TCP layer, to gain a much
> more meaningful measure that the network can meet the end user's
> application SLA (and ultimately reach some level of TCP testing
> interoperability which does not exist today).  The complexity of the
> network grows and the various queuing mechanisms in the network greatly
> affect TCP layer performance (i.e. improper default router settings for
> queuing, etc.) and devices such as firewalls, proxies, load-balancers
> can actively alter the TCP settings as a TCP session traverses the
> network (such as window size, MSS, etc.).  These are all very complex
> topics to the general network community, and I can't overemphasize the
> desire for a standard test methodology/guideline at the TCP layer.
>
> So the intent behind this draft TCP Throughput work would be to define a
> methodology for testing TCP layer performance, and guidelines for
> expected TCP throughput results that *should* be experienced in the
> network under test.  Network providers and NEMs are wrestling with
> end-end complexities of the above (queuing, active proxy devices, etc.);
> they desire to standardize the methodology to validate end-end TCP
> performance, as this is the precursor to acceptable end-user application
> performance.
>
> Before RFC-2544 testing existed, network providers and NEMs deployed a
> variety of ad hoc test techniques to verify the Layer2/3 performance of
> the network.  RFC-2544 was a huge step forward in the network test
> world, standardizing the Layer 2/3 test methodology which greatly
> improved the quality of the network (and reduced operational test
> expenses).
>
> In the case of TCP, several network providers that I work with employ
> the following TCP testing methodology (high level simplified example):
>
> 1. Run tests to determine average end-end latency and end-end bottleneck
> bandwidth (the minimum bandwidth may be known by design).
> 2. Calculate bandwidth delay product (BDP) for this situation then run a
> series of window size experiments with TCP hosts on each end of the
> network.  The network under test should be able to support the full TCP
> capacity unless there are packet loss conditions.  This may point to
> devices in the network that are altering the window size IF
> retransmissions are not the issue and throughput is not achieved.
> 3. Run multiple connection tests (i.e. 32 connections) to over-utilize
> the link.  Proper queuing techniques in the network should allow the TCP
> connections to maintain their "fair share" of the available bandwidth.
> Improper queuing (which is very common) would display bandwidth
> volatility between the connections.  This is very valuable method as
> well.
> 4. Another example is by either running single or multiple TCP
> connections with a variety of MSS values (similar to varying the frame
> size in RFC-2544).
>
> Right now there is no established methodology for any of the above tests
> in the network provider / NEM space.  So even taking a first step toward
> standardizing the test methodology at the TCP layer, would be a giant
> step forward.  As a matter of fact, this paper would probably be
> co-authored with myself and at least one network provider and/or NEM.
>
> So again, sorry for the long winded response, but I hope that this
> provides better insight into the intent of such a draft.
>
> Thanks,
> Barry
>
>
>
> Principal Member of Technical Staff
>
>
>
> JDSU Communication Test (formerly Acterna)
>
> Emerging Markets and Technology Research
>
> One Milestone Center Court
>
> Germantown, MD 20876
>
> (W) 240-404-2227
>
> (C) 301-325-7069
>
>
> -----Original Message-----
> From: Matt Mathis [mailto:matt.mathis at gmail.com]
> Sent: Thursday, October 15, 2009 2:52 PM
> To: Henk Uijterwaal
> Cc: Lars Eggert; Barry Constantine; Matthew J Zekauskas; ippm at ietf.org
> Subject: Re: [ippm] Testing TCP Throughput Capacity in Operator Networks
>
>>>> My thoughts are similar to Henk's.  IPPM has always been interested
> in
>>>> the area (see the bulk transport capacity work), so I'd claim the
> area
>>>> is in-scope, but doing work would require AD discussion.
>>>
>>> I think it wouldn't be unreasonable for IPPM to take this on, if we
> more
>>> clearly nail down what "this" is.
>
> You might want to consider finishing the work started in RFCs 2330 and
> 3148 as a sufficient definition of "this".
>
> Developing a Bulk Transport Metric has been sort of a holy grail of
> IPPM from the very beginning.    It was my primary agenda when I
> co-chaired the BOF at the Danvers IETF, April 1995).    After
> publishing a tool [TReno] and making some important progress [RFC2330,
> RFC3148, and Marl Alman's[CAP] the effort was abandoned essentially
> because the results were not sufficiently robust to be useful to
> support SLAs.  [NPAD] was my latest chapter of my efforts in this
> area.
>
> Although there are many difficulties with using TCP (notably
> differences in implementations as noted in other messages) the real
> killer is this:
>
> TCP congestion control is an equilibrium process.  Correctly
> functioning TCP with sufficient data ALWAYS causes congestion
> somewhere in the network.  This congestion manifests itself by raising
> the RTT and/or the packet loss until the data rate is consistent with
> the the model[MODEL]:
> date_rate=(MSS/RTT)*(0.7/squrt(loss_probability))   The problem is you
> can't tell how much of the congestion was caused by your measurement,
> and how much was already present in the network.   This causes sort of
> a double Heisenberg problem where you can't even tell if you are
> measuring bowling balls with pingpong balls or vice versa.  This is
> because the "stiffness" of a TCP flow (think "first derivative of the
> model") depends on the RTT, so when there are multiple flows sharing a
> bottleneck, each flow yields a different amount, depending on their
> relative RTTs.   If the measurement flow has a very short RTT, it will
> push nearly all other traffic out of the bottleneck, if the cross
> traffic has a short RTT, it will greatly depress the data rate of the
> measured folw.  As a consequence a simple measurement with a single
> TCP flow is useless for predicting performance of another flow through
> the same bottleneck.   Note that a non-predictive measurement is
> useless to the point where it can't really be called a metric.
>
> THIS IS REALLY IMPORTANT: even with an ideal TCP implementation and
> network, equilibrium TCP performance is useless as a metric.
>
> The trick (used by NPAD) is the throttle TCP with a controlled
> bottleneck, such that it does not cause congestion....
>
> This can be done using an RFC 4898 instrumented stack under real
> applications.   I will describe it in email in a bit, and a
> presentation at IPPM, if people are interested.    No I do not have
> enough cycles to generate an ID in time for the submission deadline.
>
> Thanks,
> --MM--
> -------------------------------------------
> Matt Mathis      http://staff.psc.edu/mathis
> Work:412.268.3319   Home/Cell:412.654.7529
> -------------------------------------------
> Evil is defined by mortals who think they know
> "The Truth" and use force to apply it to others.
>