idnits 2.17.1 draft-ietf-ippm-treno-btc-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 344 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. ** There is 1 instance of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 136 has weird spacing: '... of the bound...' -- The document date (Feb 1999) is 9196 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Mathis96' is mentioned on line 164, but not defined == Unused Reference: 'Jacobson88' is defined on line 308, but no explicit reference was found in the text == Unused Reference: 'Mathis97b' is defined on line 322, but no explicit reference was found in the text == Unused Reference: 'RFC2001' is defined on line 326, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88' -- Possible downref: Non-RFC (?) normative reference: ref. 'Mathis97a' -- Possible downref: Non-RFC (?) normative reference: ref. 'Mathis97b' ** Obsolete normative reference: RFC 2001 (Obsoleted by RFC 2581) Summary: 9 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Expires Aug 1999 INTERNET-DRAFT 3 Network Working Group Matt Mathis 4 INTERNET-DRAFT Pittsburgh Supercomputing Center 5 Expiration Date: Aug 1999 Feb 1999 7 TReno Bulk Transfer Capacity 9 < draft-ietf-ippm-treno-btc-03.txt > 11 Status of this Document 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as 19 Internet-Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other 23 documents at any time. It is inappropriate to use Internet- 24 Drafts as reference material or to cite them other than as 25 "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Abstract: 35 TReno is a tools to measure Bulk Transport Capacity (BTC) as defined 36 in [ippm-btc-framework]. This document specifies specific details 37 of the TReno algorithm as require by the BTC framework document. 39 2. Introduction: 41 This memo defines a Bulk Transport Capacity (BTC) based on the TReno 42 (``tree-no'') diagnostic [Mathis97a]. It builds on notions 43 introduced in the BTC framework document [ippm-btc-framework] and 44 the IPPM Framework document, RFC 2330 [@@]; the reader is assumed to 45 be familiar with both documents. 47 The BTC framework document defines pure Congestion Avoidance 48 Capacity (CAC) as the data rate (bits per second) of the Congestion 49 Avoidance algorithm, subject to the restriction that the 50 Retransmission Timeout and Slow-Start algorithms are not invoked. 51 In principle a CAC metric would be an ideal BTC metric, but there 52 are rather substantial difficulty with using it as such. The 53 Self-Clocking of the Congestion Avoidance algorithm can be very 54 fragile, depending on the specific details of the Fast Retransmit, 55 Fast Recovery or other advanced recovery algorithms. When TCP 56 looses Self-Clock it is reestablished through a retransmission 57 timeout and Slow-Start. These algorithms nearly always take more 58 time than Congestion Avoidance would have taken. 60 The TReno program implements BTC, CAC and ancillary metrics. The 61 ancillary metrics are designed to instrument all network events that 62 might cause discrepancies between an ideal CAC metric and the TReno 63 BTC, other BTC metrics or real TCP implementations. 65 We use this multiple metrics approach because the CAC metric is more 66 suitable for analytic modeling while the BTC metrics is more suited 67 to applied measurement. We believe that future research will lead 68 to a strong analytic framework (A-frame) [ippm-btc-framework] that 69 will result in understanding the relationship between CAC metrics 70 and other metrics, including simple metrics (delay, loss) as well as 71 the various different BTC metrics and TCP implementations. 73 3. The TReno BTC Definition 75 3.1. Metric Name: 77 TReno-Type-P-Bulk-Transfer-Capacity 79 3.2. Metric Parameters: 81 + Src, the IP address of a host 83 + Dst, the IP address of a host 85 + Initial Maximum Segment size 87 + a test duration 89 + T, a time 91 3.3. Metric Units: 93 Bits per second 95 3.4. Definition: 97 The average data rate attained by the TReno program over the path 98 under test. 100 3.5 Congestion Control Algorithms 102 The BTC framework document [ippm-btc-framework] makes the 103 observation that the standard specifying congestion control 104 algorithms [RFC2001.bis] allows more latitude in their 105 implementation than is appropriate for a metric. Some of the 106 details of the congestion control algorithms that are left to the 107 discretion of the implementor must be fully specified in a metric. 109 3.5.1 Congestion Avoidance details 111 TReno computes the window size in bytes. Each acknowledgment opens 112 the congestion window (cwnd) by MSS*MSS/cwnd bytes. The actual 113 number of outstanding bytes in the network is always an integral 114 number of segments such that the total size is less than or equal to 115 cwnd. 117 @@@ the framework needs to require that delayed Acks emulation be 118 specified. 120 When a loss is detected the window is reduced using a algorithm that 121 sends one segment per two acknowledgments for exactly one round trip 122 (as determied by sequence numbers). This reduces the window to 123 exactly half of the data that was actually held by the network at 124 the time the first loss was detected. This algorithm, called 125 Rate-Halving, is described in detail in a separate technical note 126 [facknote]. The new cwnd will be (old_cwnd - loss)/2. 128 The technical not also describes an additional group of algoritms, 129 collectivly called bounding parameters, that assure that rate 130 halving always arrives at a reasonable congestioin window, even 131 under pathological conditions. The bounding parameter algorithms 132 have no effect on TReno under normal conditons. If the bounding 133 parameters are invoked, they are instrumented and an exceptional 134 network event. 136 The one of the bounding parameters is to set ssthresh to 1/4 of 137 the pre-recovery cwnd. Thus recovery normally ends with cwnd larger 138 than ssthresh, so TReno does not do a one segment slow-start as 139 permitted by RFC2001. However, if more than half a window of data 140 was lost, rate having can arrive at a new cwnd which is smaller than 141 ssthresh, resulting in a slow-start up to ssthresh (which would be 142 1/4 the prior value of cwnd). 144 3.5.2 Retransmission Timeouts 146 The current version of TReno does not include an accurate model for 147 the TCP retransmission timer. Under nearly all normal conditions 148 the timers in TReno are much more conservative than real TCP 149 implementations. TReno takes the view that timeouts indicate a 150 failure to attain a CAC measurement, which an abnormality in the 151 network that should be diagnosed. TReno doem not experience 152 timeouts unless an entire window of data is lost. 154 3.5.3 Slow-Start 156 TReno invokes Slow-start if cwnd is equal to or less than ssthresh. 157 Unlike most TCP implementations this condition is not normally true 158 at the end of recovery. 160 3.5.4 Advanced Recovery Algorithms 162 The algorithm used by TReno to emulate the TCP reassembly queue 163 naturally emulates SACK [RFC2018] with the Forward Acknowledgment 164 Algorithm [Mathis96] as updated by [facknote]. 166 3.5.5 Segment Size 168 TReno can dynamicly discover the correct Maximum Segment Size through 169 path MTU discovery. A smaller MTU can be explicitly selected. 171 3.6 Ancillary results: 173 @@@ expand 175 - Statistics over the entire test 176 (data transferred, duration and average rate) 177 - Statistics over the Congestion Avoidance portion of the test 178 (data transferred, duration and average rate) 179 - Path property statistics (MTU, minimum RTT, maximum congestion 180 window during Congestion Avoidance and during Slow-start) 181 - Direct measures of the analytic model parameters (Number 182 of congestion signals, average RTT) 183 - Indications of which TCP algorithms must be present to 184 attain the same performance. 185 - The estimated load/BW/buffering used on the return path 186 - Warnings about data transmission abnormalities. 187 (e.g. packets out-of-order, events that cause timeouts) 188 - Warnings about conditions which may affect metric 189 accuracy. (e.g. insufficient tester buffering) 190 - Alarms about serious data transmission abnormalities. 191 (e.g. data duplicated in the network) 192 - Alarms about internal inconsistencies of the tester and 193 events which might invalidate the results. 194 - IP address/name of the responding target. 195 - TReno version. 197 3.7 Manual calibration checks: 199 The following discussion assumes that the TReno diagnostic is 200 implemented as a user mode program running under a standard 201 operating system. Other implementations, such as those in dedicated 202 measurement instruments, can have stronger built-in calibration 203 checks. 205 3.7.1 Tester performance 207 Verify that the tester and target have sufficient data rates to 208 sustain the test. 210 The raw performance (data rate) limitations of both the tester and 211 target should be measured by running TReno in a controlled 212 environment (e.g. a bench test). Ideally the observed performance 213 limits should be validated by determining the nature of the 214 bottleneck and verifying that it agrees with other benchmarks of the 215 tester and target (e.g. That TReno performance agrees with direct 216 measures of backplane or memory bandwidth or other bottleneck as 217 appropriate). Currently no routers are reliable targets, although 218 under some conditions they can be used for meaningful measurements. 219 When testing between a pair of modern computer systems at a few 220 megabits per second or less, the tester and target are unlikely to 221 be the bottleneck. 223 TReno may be less accurate at average rates above half of the known 224 tester or target limits. This is because during the initial 225 Slow-start TReno needs to send bursts which are twice the average 226 data rate. 228 Likewise, if the link to the first hop is not more than twice as 229 fast as the entire path, some of the path properties such as max 230 congestion window during Slow-start may reflect the testers link 231 interface, and not the path itself. 233 3.7.2 Tester Buffering 235 Verify that the tester and target have sufficient buffering to 236 support the window needed by the test. 238 If they do not have sufficient buffer space, then losses at their 239 own queues may contribute to the apparent losses along the path. 240 There are several difficulties in verifying the tester and target 241 buffer capacity. First, there are no good tests of the targets 242 buffer capacity at all. Second, all validation of the testers 243 buffering depends in some way on the accuracy of reports by the 244 tester's own operating system. Third, there is the confusing result 245 that under many circumstances (particularly when there is much more 246 than sufficient average tester performance) insufficient buffering 247 in the tester does not adversely impact measured performance. 249 TReno reports (as calibration alarms) any events in which transmit 250 packets were refused due to insufficient buffer space. It reports a 251 warning if the maximum measured congestion window is larger than the 252 reported buffer space. Although these checks are likely to be 253 sufficient in most cases they are probably not sufficient in all 254 cases, and will be the subject of future research. 256 Note that on a timesharing or multi-tasking system, other activity 257 on the tester introduces burstiness due to operating system 258 scheduler latency. Since some queuing disciplines discriminate 259 against bursty sources, it is important that there be no other 260 system activity during a test. This should be confirmed with other 261 operating system specific tools. 263 3.7.3 Return Path performance 265 Verify that the return path is not a bottleneck at the load needed 266 to sustain the test. 268 In ICMP mode TReno measures the net effect of both the forward and 269 return paths on a single data stream. Bottlenecks and packet losses 270 in the forward and return paths are treated equally. 272 In traceroute mode, TReno computes and reports the load it 273 contributes to the return path. Unlike real TCP, TReno can not 274 distinguish between losses on the forward and return paths, so 275 ideally we want the return path to introduce as little loss as 276 possible. A good way to test to see if the return path has a large 277 effect on a measurement is to reduce the forward path messages down 278 to ACK size (40 bytes), and verify that the measured packet rate is 279 improved by at least factor of two. [More research is needed.] 281 3.8 Discussion: 283 There are many possible reasons why a TReno measurement might not 284 agree with the performance obtained by a TCP-based application. 285 Some key ones include: older TCPs missing key algorithms such as MTU 286 discovery, support for large windows or SACK, or miss-tuning of 287 either the data source or sink. Network conditions which require 288 the newer TCP algorithms are detected by TReno and reported in the 289 ancillary results. Other documents will cover methods to diagnose 290 the difference between TReno and TCP performance. 292 People using the TReno metric as part of procurement documents 293 should be aware that in many circumstances MTU has an intrinsic 294 and large impact on overall path performance. Under some 295 conditions the difficulty in meeting a given performance 296 specifications is inversely proportional to the square of the 297 path MTU. (e.g. Halving the specified MTU makes meeting the 298 bandwidth specification 4 times harder.) 300 When used as an end-to-end metric TReno presents exactly the same 301 load to the network as a properly tuned state-of-the-art bulk TCP 302 stream between the same pair of hosts. Although the connection 303 is not transferring useful data, it is no more wasteful than 304 fetching an unwanted web page with the same transfer time. 306 References 308 [Jacobson88] Jacobson, V., "Congestion Avoidance and Control", 309 Proceedings of SIGCOMM '88, Stanford, CA., August 1988. 311 [mathis96] Mathis, M. and Mahdavi, J. "Forward acknowledgment: 312 Refining TCP congestion control", Proceedings of ACM SIGCOMM '96, 313 Stanford, CA., August 1996. 315 [RFC2018] Mathis, M., Mahdavi, J. Floyd, S., Romanow, A., "TCP 316 Selective Acknowledgment Options", 1996 Obtain via: 317 ftp://ds.internic.net/rfc/rfc2018.txt 319 [Mathis97a] Mathis, M., TReno source distribution, Obtain via: 320 ftp://ftp.psc.edu/pub/networking/tools/treno.shar 322 [Mathis97b] Mathis, M., Semke, J., Mahdavi, J., Ott, T., 323 "The Macroscopic Behavior of the TCP Congestion Avoidance 324 Algorithm", Computer Communications Review, 27(3), July 1997. 326 [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, 327 Fast Retransmit, and Fast Recovery Algorithms", 328 ftp://ds.internic.net/rfc/rfc2001.txt 330 [facknote] Mathis, M., Mahdavi, M., TCP Rate-Halving with Bounding 331 Parameters http://www.psc.edu/networking/papers/FACKnotes/current/ 333 Author's Address 335 Matt Mathis 336 email: mathis@psc.edu 337 Pittsburgh Supercomputing Center 338 4400 Fifth Ave. 339 Pittsburgh PA 15213