idnits 2.17.1 draft-ietf-ippm-btc-framework-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 576 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 5 instances of too long lines in the document, the longest one being 5 characters in excess of 72. ** There are 5 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC98' is mentioned on line 108, but not defined == Unused Reference: 'Flo95' is defined on line 474, but no explicit reference was found in the text == Unused Reference: 'RF98' is defined on line 477, but no explicit reference was found in the text == Unused Reference: 'Hoe95' is defined on line 485, but no explicit reference was found in the text == Unused Reference: 'OKM96b' is defined on line 511, but no explicit reference was found in the text == Unused Reference: 'Pax97a' is defined on line 516, but no explicit reference was found in the text == Unused Reference: 'Pax97c' is defined on line 522, but no explicit reference was found in the text == Unused Reference: 'RFC2001' is defined on line 537, but no explicit reference was found in the text == Unused Reference: 'RFC2018' is defined on line 545, but no explicit reference was found in the text == Unused Reference: 'Ste94' is defined on line 553, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'LowWindow' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF96' == Outdated reference: A later version (-02) exists of draft-ietf-tcpimpl-newreno-00 ** Downref: Normative reference to an Experimental draft: draft-ietf-tcpimpl-newreno (ref. 'FH98') -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo95' ** Downref: Normative reference to an Experimental draft: draft-kksjf-ecn (ref. 'RF98') -- Possible downref: Non-RFC (?) normative reference: ref. 'Hoe96' -- Possible downref: Non-RFC (?) normative reference: ref. 'Hoe95' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jac88' -- Possible downref: Non-RFC (?) normative reference: ref. 'Lak94' -- Possible downref: Non-RFC (?) normative reference: ref. 'MM96a' -- Possible downref: Non-RFC (?) normative reference: ref. 'MM96b' -- Possible downref: Non-RFC (?) normative reference: ref. 'MSMO97' -- Possible downref: Non-RFC (?) normative reference: ref. 'OKM96a' -- Possible downref: Non-RFC (?) normative reference: ref. 'OKM96b' -- Possible downref: Non-RFC (?) normative reference: ref. 'Pax97a' -- Possible downref: Non-RFC (?) normative reference: ref. 'Pax97b' -- Possible downref: Non-RFC (?) normative reference: ref. 'Pax97c' -- Possible downref: Non-RFC (?) normative reference: ref. 'PFTK98' ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) ** Obsolete normative reference: RFC 2001 (Obsoleted by RFC 2581) ** Downref: Normative reference to an Informational RFC: RFC 2330 -- Possible downref: Non-RFC (?) normative reference: ref. 'Ste94' Summary: 17 errors (**), 0 flaws (~~), 13 warnings (==), 20 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Expires June 1999 INTERNET-DRAFT 3 Network Working Group Matt Mathis 4 INTERNET-DRAFT Pittsburgh Supercomputing Center 5 Expiration Date: June 1999 Mark Allman 6 NASA Lewis 8 Empirical Bulk Transfer Capacity 10 < draft-ietf-ippm-btc-framework-00.txt > 12 Status of this Document 14 This document is an Internet-Draft. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its areas, 16 and its working groups. Note that other groups may also distribute 17 working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months, and may be updated, replaced, or obsoleted by other documents 21 at any time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 To view the entire list of current Internet-Drafts, please check the 25 "1id-abstracts.txt" listing contained in the Internet-Drafts shadow 26 directories on ftp.is.co.za (Africa), nic.nordu.net (Northern 27 Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific 28 Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). 30 This memo provides information for the Internet community. This memo 31 does not specify an Internet standard of any kind. Distribution of 32 this memo is unlimited. 34 Abstract: 36 Bulk Transport Capacity (BTC) is a measure of a network's ability 37 to transfer significant quantities of data with a single 38 congestion-aware transport connection (e.g., TCP). The intuitive 39 definition of BTC is the expected long term average data rate 40 (bits per second) of a single ideal TCP implementation over the 41 path in question. However, there are many congestion control 42 algorithms (and hence transport implementations) permitted by 43 IETF standards. This diversity in transport algorithms creates a 44 difficulty for standardizing BTC metrics because the allowed 45 diversity is sufficient to lead to situations where different 46 implementations will yield non-comparable measures -- and 47 potentially fail the formal tests for being a metric. 49 This document defines a framework for standardizing multiple BTC 50 metrics that parallel the permitted transport diversity. Two 51 approaches are used. First, each BTC metric must be much more 52 tightly specified than the typical IETF protocol. Pseudo-code or 53 reference implementations are expected to be the norm. Second, 54 each BTC methodology is expected to collect some ancillary metrics 55 which are potentially useful to support analytical models of BTC. 57 1. Introduction 59 Bulk Transport Capacity (BTC) is a measure of a network's ability 60 to transfer significant quantities of data with a single 61 congestion-aware transport connection (e.g., TCP). For many 62 applications the BTC of the underlying network dominates the 63 overall elapsed time for the application and thus dominates the 64 performance as perceived by a user. Examples of such 65 applications include FTP, and the world wide web when delivering 66 large images or documents. 68 The intuitive definition of BTC is the expected long term average 69 data rate (bits per second) of a single ideal TCP implementation 70 over the path in question. 72 Central to the notion of bulk transport capacity is the idea that 73 all transport protocols should have similar responses to 74 congestion in the Internet. Indeed the only form of equity 75 significantly deployed in the Internet today is that the vast 76 majority of all traffic is carried by TCP implementations sharing 77 common congestion control algorithms largely due to a shared 78 developmental heritage. 80 [RFC2001.bis] specifies the standard congestion control algorithms 81 used by these TCP implementations. Even though this document is a 82 (proposed) standard, it permits considerable latitude in 83 implementation. This latitude is by design, to encourage ongoing 84 evolution in congestion control algorithms. 86 This legal diversity in transport algorithms creates a 87 difficulty for standardizing BTC metrics because the allowed 88 diversity is sufficient to lead to situations where different 89 implementations will yield non-comparable measures -- and 90 potentially fail the formal tests for being a metric. 92 @@@ A more serious problem is that most of the existing CC algorithms 93 @ do not assure that improving the properties of a path improves the 94 @ measure of that path. That is existing TCP implementations do not 95 @ always have performance that monotonically increase with true path 96 @ capacity. 97 # 98 # OK. I'll leave that to you... I think it needs said and supported 99 # with some explanation. --allman 100 @ Next pass --MM-- 102 Furthermore congestion control and related areas, including 103 Integrated services[@@], differentiated services[@@] and Internet 104 traffic analysis[@@] are all currently receiving a lot of 105 attention from the research community. It is very likely that we 106 will see new experimental congestion control algorithms in the near 107 future. In addition, explicit congestion notification (ECN) 108 [RFC98] is being tested for Internet deployment. We do not yet 109 know how any of these developments might affect BTC metrics. 111 This document defines a framework for standardizing multiple BTC 112 metrics that parallel the permitted transport diversity. Two 113 approaches are used. First, each BTC metric must be much more 114 tightly specified than the typical IETF protocol. Pseudo-code or 115 reference implementations are expected to be the norm. Second, 116 each BTC methodology is expected to collect some ancillary metrics 117 which are potentially useful to support analytical models of BTC. 119 For example, the models in [PFTK98, MSMO97, OKM96a, Lak94] all 120 predict bulk performance based on path properties such as loss 121 rate, round trip time, etc. A BTC methodology which also provides 122 ancillary measures of these properties is stronger because 123 agreement with the analytical models can be used to corroborate 124 the direct BTC measurement results. 126 More importantly these ancillary metrics are expected to be useful 127 for resolving disparity between different BTC metrics. For 128 example, a path that predominantly experiences clustered packet 129 losses is likely to exhibit vastly different measures from BTC 130 metrics that mimic Tahoe, Reno, NewReno, and SACK TCP 131 algorithms [FF96]. The differences in the BTC metrics over 132 such a path might be diagnosed by an ancillary measure of loss 133 clustering. 135 Furthermore there are some path properties which are best measured 136 as ancillary metrics to a transport protocol. Examples of such 137 properties include bottleneck queue limits or the tendency to 138 reorder packets. These are difficult or impossible to measure at 139 low rates and unsafe to measure at rates higher than the bulk 140 transport capacity of the path. 142 It is expected that at some point in the future there will exist 143 an A-frame [RFC2330] which will unify all simple path metrics 144 (e.g., segment loss rates, round trip time) and BTC ancillary 145 metrics (e.g. queue size and packet reordering) with different 146 versions of BTC metrics (e.g., that parallel Reno or SACK TCP). 148 2. Congestion Control Algorithms 150 Nearly all TCP implementations in use today are based on 151 congestion control algorithms published in [Jac88] and further 152 refined in [RFC2001,RFC2001.bis]. In addition to the basic notion 153 of using an ACK clock, TCP (and therefore BTC) implements five 154 standard congestion control algorithms: Congestion Avoidance, 155 Retransmission timeouts, Slow-start, Fast Retransmit and Fast 156 Recovery. All BTC implementations must use these algorithms as 157 they are defined in [RFC2001.bis]. However, in all cases a BTC 158 metric must more tightly specify these algorithms, as discussed 159 below. 161 2.1 Congestion Avoidance 163 The Congestion Avoidance algorithm drives the steady-state bulk 164 transfer behavior of TCP. It calls for opening the congestion 165 window (cwnd) by a constant additive amount during each round trip 166 time (RTT), and closing it by a constant multiplicative fraction 167 on congestion, as indicated by lost segments. The window closing 168 is specified to be half the number of outstanding data segments in 169 flight when loss is detected. A BTC metric must specify the 170 following Congestion Avoidance details: 172 The exact algorithm for incrementing cwnd is left to the 173 implementer. Several candidate algorithms are outlined in 174 [RFC2001.bis]. In addition, some of these algorithms include some 175 rounding. For these reasons, the exact algorithm for increasing 176 cwnd during congestion avoidance must be fully specified for 177 each BTC metric defined. 179 [RFC2001.bis] permits an extra plus one segment window 180 adjustment following the multiplicative closing of cwnd. This 181 is because [RFC2001.bis] allows a single invocation of the Slow-Start 182 algorithm when when cwnd equals ssthresh at the end of 183 recovery. 185 2.2 Retransmission Timeouts 187 In order to provide reliable data delivery, TCP resends a segment if 188 the ACK for the given segment does not arrive before the 189 retransmission timer (RTO) fires. A BTC metric must implement an 190 RTO timer to trigger retransmissions not handled by the fast 191 retransmit algorithm. Such retransmissions can have a large impact 192 on the measured capacity. Calculating the RTO is subject to a 193 number of details that are not standardized. When implementing a 194 BTC metric the details of the RTO calculation, how and when the 195 clock is set, as well as the clock granularity must be fully 196 documented. 198 2.3 Slow Start 200 Slow start is part of TCP's transient behavior. It is used to 201 quickly bring new or recently restarted connections up to an 202 appropriate congestion window. In addition, slow start is used to 203 restart the ACK clock after a retransmission timeout. A BTC 204 implementation must use the slow start algorithm, as specified by 205 [RFC2001.bis]. The slow start algorithm is used while the congestion 206 window (cwnd) is less than the slow start threshold (ssthresh). 207 However, whether to use slow start or congestion avoidance when cwnd 208 equals ssthresh is left to the implementer by [RFC2001.bis]. This 209 detail must be specified in every specific BTC metric definition. 211 2.4 Fast Retransmit/Fast Recovery 213 The Fast Retransmit/Fast Recovery algorithms are used to infer 214 segment loss before the RTO expires. A BTC implementation must 215 implement the algorithms as defined in [RFC2001.bis]. 217 In Reno TCP, Fast Retransmit and Fast Recovery are used to support 218 the Congestion Avoidance algorithm during recovery from lost 219 segments. During Fast Recovery, the data receiver sends duplicated 220 acknowledgments. The data sender uses these duplicate ACKs to 221 detect loss, to estimate the quantity of data in the network still 222 pending delivery and to clock out new data in an effort to keep the 223 ACK clock running. 225 2.5 Advanced Recovery Algorithms 227 It has been observed that under some conditions the Fast 228 Retransmit and Fast Recovery algorithms do not reliably preserve 229 TCP's Self-Clock, causing unpredictable or unstable TCP 230 performance [Lak94@@@check, Flo95]. Simulations of reference TCP 231 implementations have uncovered situations where incidental changes 232 in other parts of the network have a large effect on performance 233 [MM96a]. Other simulations have shown that under some 234 conditions, slightly better networks (higher bandwidth, lower 235 delay or less load from other connections) yield lower throughput. 236 @@@ This is pretty easy to construct, but has it been published? 237 # Not that I can think of off the top of my head... Maybe a concrete 238 # example to back up the claim? --allman 240 [RFC2001.bis] allows a TCP implementation to use more robust loss 241 recovery algorithms, such as NewReno type algorithms 242 [FH98,FF96,Hoe96] and SACK-based algorithms [FF96,MM96a,MM96b]. 243 While allowing these algorithms, [RFC2001.bis] does not define any 244 such algorithm and therefore, a BTC metric that implements 245 advanced recovery algorithms must fully specify the details. 247 Note that since TCP based on standard Fast Retransmit and Fast 248 Recovery sometimes exhibits erratic performance [MM96a], these 249 algorithms may prove to be unsuitable for use in a metric. 250 # Ouch... I know what you're saying, but... If the goal is to see what 251 # congestion-aware transport connection yields, I think the above is a 252 # little harsh given the current standardized CC algorithms. 254 2.6 Segment Size 256 The actual segment size, or method of choosing a segment size 257 (e.g., path MTU discovery [RFC1191]) and the number of header 258 bytes assumed to be prepended to each segment must be specified. 259 In addition if the segment size is artificially limited to less 260 than the path MTU this must be indicated. 262 3 Ancillary Metrics 264 The following ancillary metrics should be implemented in every BTC 265 that can exhibit the relevant behaviors. Alternatively, the BTC 266 implementation should provide enough information that the following 267 information can be gathered in post-processing (e.g., by providing a 268 segment trace of the connection). 270 3.1 Congestion Avoidance Capacity 272 Define a pure "Congestion Avoidance Capacity" (CAC) metric to be 273 the data rate (bits per second) of a fully specified 274 implementation of the Congestion Avoidance algorithm, subject to 275 the restriction that the Retransmission Timeout and Slow-Start 276 algorithms are not invoked. The CAC metric is defined to have no 277 meaning across Retransmission Timeouts or Slow-Start (except the 278 single segment Slow-Start that is permitted to follow recovery). 280 In principle a CAC metric would be an ideal BTC metric. But there 281 is a rather substantial difficulty with using it as such. The 282 Self-Clocking of the Congestion Avoidance algorithm can be very 283 fragile, depending on the specific details of the Fast Retransmit, 284 Fast Recovery or advanced recovery algorithms above. 286 When TCP looses Self-Clock it is reestablished through a 287 retransmission timeout and Slow-Start. These algorithms nearly 288 always take more time than Congestion Avoidance would have taken. 290 It is easily observed that unless the network loses an entire 291 window of data (which would clearly require a retransmit timeout) 292 TCP missed some opportunity to send data. That is, if TCP 293 experiences a timeout after losing any partial window of data, it 294 must have received at least one ACK that was generated after some 295 of the partial data was delivered, but did not trigger 296 transmitting any new data. Much recent research in congestion 297 control (e.g., FACK[MM96a], NewReno[FH98], [LowWindow]) can be 298 characterized as making TCP's Self-Clock more tenacious, while 299 preserving fairness under adverse conditions. This work is often 300 motivated by how poorly current TCP implementations perform under 301 some conditions, often due to repeated clock loss. Since this is 302 an active research area, different TCP implementations have rather 303 considerable differences in their ability to preserve Self-Clock. 305 3.2 Ancillary metrics relating to the preservation of Self-Clock 307 Since loosing the clock can have a large effect on the overall BTC, 308 and the clock is itself fragile in ways that are very dependent on 309 the recovery algorithm, it is important that the transitions between 310 timer driven and Self-Clocked operation be instrumented. 312 3.2.1 Lost transmission opportunities 314 If the last event before a timeout was the receipt of an ACK that 315 did not trigger a retransmission, the possibility exists that 316 some other congestion control algorithm would have successfully 317 preserved the Self-Clock. In this event, instrumenting key parts 318 of the BTC state (e.g., cwnd) may lead to further improvements in 319 congestion control algorithms. 321 Note that in the absence of knowledge about the future, it is not 322 possible to design an algorithm that never misses transmission 323 opportunities. However, there are ever more subtle ways to gauge 324 network state, and to estimate if a given ACK is likely to be the 325 last. 327 3.2.2 Loosing an entire window 329 If an entire window of data (or ACKs) is lost, there will be no 330 returning ACKs to clock out additional data. This condition can 331 be detected if the last event before a timeout was a data 332 transmission triggered by an ACK. The loss of an entire window 333 of data/ACKs forces recovery to be via a Retransmission Timeout and 334 Slow-Start. 336 Losing an entire window of data implies an outage with a duration 337 at least as long as a round trip time. Such an outage can not be 338 diagnosed with low rate metrics and is unsafe to diagnose at 339 higher rates than the BTC. Therefore all BTC metrics at should 340 instrument and report losses of an entire window of data. 342 There are some conditions, such as at very small window, in which 343 there is a significant probability that an entire window can be 344 legitimately lost through individual random losses. 346 3.2.3 Heroic clock preservation 348 All algorithms that permit a given BTC to sustain Self-Clock when 349 other algorithms might not, should be instrumented. Furthermore, 350 the details of the algorithms used must be fully documented. 352 BTC metrics that can sustain Self-Clock in the presence of 353 multiple losses within one round trip should instrument the 354 loss distribution, such that the performance of Reno style 355 bulk transport can be estimated. 357 BTC algorithms that can trigger fast retransmits earlier than 358 following three duplicate acknowledgments (e.g. at small 359 window [LowWindow]), should instrument and fully document 360 these events as well. 362 3.2.4 False timeouts 364 All false timeouts, (where the transmission timer expires before 365 the ACK for some previously transmitted data arrives) should be 366 instrumented when possible. Note that depending upon how the BTC 367 metric implements sequence numbers, this may be difficult to 368 detect. 370 3.3 Ancillary metrics relating to flow based path properties 372 All BTC metrics provide unique vantage points for instrumenting 373 certain path properties relating to closely spaced packets. As in 374 the case of RTT duration outages, these can be impossible to 375 diagnose at low rates (less than 1 packet per RTT) and 376 inappropriate to test at rates above the BTC. 378 All BTC metrics should instrument packet reordering. The severity 379 of the reordering can be classified as one of three different 380 cases, each of which should be instrumented. 382 Packets that are only slightly out of order should not trigger 383 retransmission, but they may affect the window calculation. 384 BTC metrics must document how slightly out-of-order packets 385 affect the congestion window calculation. The frequency and 386 distance out of sequence must be instrumented for all 387 out-of-order packets. 389 If packets are sufficiently out-of-order, the Fast Retransmit 390 algorithm will be invoked in advance of the delayed packet's 391 late arrival. These events must be instrumented. 392 Even though the the late arriving packet will complete 393 recovery, the the window must still be reduced by half. 395 Under some rare conditions packets have been observed that are 396 far out of order - sometimes many seconds late [Pax97b]. 397 These should always be instrumented. 399 The BTC should instrument the maximum cwnd observed during 400 congestion avoidance and slow start. A TCP running over the same 401 path must have sufficient sender buffer space and receiver window 402 (and window shift [RFC1323]) to cover this cwnd. 404 There are several other path properties that one might measure 405 within a BTC metric. For example, with an embedded one-way delay 406 metric it may be possible to measure how queueing delay and 407 and (RED) drop probabilities are correlated to window size. 408 These are all open research questions. 410 3.4 Ancillary metrics pertaining to MTU discovery 412 Under some conditions, BTC can be very sensitive to segment size. 413 In addition to instrumenting the segment size, a BTC metric should 414 indicate how it was selected: by path MTU discovery [RFC1191], a 415 manual control, system default, or the maximum MTU for the 416 interface. 418 Note that the most popular LAN technologies have smaller MTUs 419 than nearly all WAN technologies. As a consequence, it is 420 difficult to measure the true performance of a wide area path 421 without subjecting it to the smaller MTU of the LAN. 423 3.4 Ancillary metrics as calibration checks 425 Unlike low rate metrics, BTC must have explicit checks that the 426 test platform is not the bottleneck, either due to insufficient 427 tester data rate or buffer space. 429 Ideally all queues within the tester should be instrumented. All 430 packets dropped within the tester should be instrumented as tester 431 failures, invalidating a measurement. 433 The maximum queue lengths should be instrumented. Any significant 434 queue may indicate that the tester itself has insufficient burst 435 data rate, and is slightly smoothing the data into the network. 437 3.4.3 Validate Reverse path load 439 @@@@ What happens to a BTC when the reverse path is congested? Is 440 this identical to TCP? What should happen? How should it be 441 instrumented? 442 # 443 # Some implementations (mine!) have an annoying feature whereby ACK loss 444 # looks just like data loss. This should be documented. If ACK loss 445 # and data loss can be detected separately, I think ACK loss rate should 446 # be reported, as it slightly changes the ACK clock (can impact 447 # algorithms like slow start that work on a per ACK basis and can make 448 # the sender more bursty, which could cause more loss). 449 @ and mine --MM-- 451 3.5 Ancillary metrics relating to the need for advanced TCP features 453 If TCP would require RFC1323 features (window scaling, timestamp 454 based round trip time measurement, protection from wrapped 455 sequences, etc) to match the BTC performance, it should be 456 reported. 458 4 Acknowledgments 460 Jeff Semke, for numerous clarifications. 462 5 References 464 [LowWindow] @@@@@ Current work 466 [FF96] Fall, K., Floyd, S.. "Simulation-based Comparisons of Tahoe, 467 Reno and SACK TCP". Computer Communication Review, July 1996. 468 ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z. 470 [FH98] Floyd, S., Henderson, T., "The NewReno Modification to 471 TCP's Fast Recovery Algorithm", Work in progress 472 draft-ietf-tcpimpl-newreno-00.txt 474 [Flo95] Floyd, S., "TCP and successive fast retransmits", 475 March 1995, Obtain via ftp://ftp.ee.lbl.gov/papers/fastretrans.ps. 477 [RF98] K. Ramakrishnan, S. Floyd, "A Proposal to add Explicit 478 Congestion Notification (ECN) to IP", Work in progress 479 draft-kksjf-ecn-03.txt 481 [Hoe96] Hoe, J., "Improving the start-up behavior of a congestion 482 control scheme for TCP, Proceedings of ACM SIGCOMM '96, 483 August 1996. 485 [Hoe95] Hoe, J., "Startup dynamics of TCP's congestion control 486 and avoidance schemes". Master's thesis, Massachusetts Institute 487 of Technology, June 1995. 489 [Jac88] Jacobson, V., "Congestion Avoidance and Control", 490 Proceedings of SIGCOMM '88, Stanford, CA., August 1988. 492 [Lak94] Lakshman, Effects of random loss 494 [MM96a] Mathis, M. and Mahdavi, J. "Forward acknowledgment: 495 Refining TCP congestion control", Proceedings of ACM SIGCOMM '96, 496 Stanford, CA., August 1996. 498 [MM96b] M. Mathis, J. Mahdavi, "TCP Rate-Halving with Bounding 499 Parameters" Available from 500 http://www.psc.edu/networking/papers/FACKnotes/current. 502 [MSMO97] Mathis, M., Semke, J., Mahdavi, J., Ott, T., 503 "The Macroscopic Behavior of the TCP Congestion Avoidance 504 Algorithm", Computer Communications Review, 27(3), July 1997. 506 [OKM96a], Ott, T., Kemperman, J., Mathis, M., "The Stationary 507 Behavior of Ideal TCP Congestion Avoidance", In progress, August 508 1996. Obtain via pub/tjo/TCPwindow.ps using anonymous ftp to 509 ftp.bellcore.com 511 [OKM96b], Ott, T., Kemperman, J., Mathis, M., "Window Size 512 Behavior in TCP/IP with Constant Loss Probability", DIMACS 513 Special Year on Networks, Workshop on Performance of Real-Time 514 Applications on the Internet, Nov 1996. 516 [Pax97a] Paxson, V., "Automated Packet Trace Analysis of TCP 517 Implementations", Proceedings of ACM SIGCOMM '97, August 1997. 519 [Pax97b] Paxson, V., "End-to-End Internet Packet Dynamics," 520 Proceedings of SIGCOMM '97, Cannes, France, Sep. 1997. 522 [Pax97c] Paxson, V, editor "Known TCP Implementation Problems", 523 Work in progress: http://reality.sgi.com/sca/tcp-impl/prob-01.txt 525 [PFTK98] Padhye, J., Firoiu. V., Towsley, D., and Kurose, J., "TCP 526 Throughput: A Simple Model and its Empirical Validation", 527 Proceedings of ACM SIGCOMM '98, August 1998. 529 [RFC1191] Mogul, J., Deering, S., "Path MTU Discovery", 530 November 1990, Obtain via: 531 ftp://ds.internic.net/rfc/rfc1191.txt 533 [RFC1323] Jacobson, V., Braden, R., Borman, D., "TCP Extensions 534 for High Performance", May 1992, Obtain via: 535 ftp://ds.internic.net/rfc/rfc1323.txt 537 [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, 538 Fast Retransmit, and Fast Recovery Algorithms", 539 ftp://ds.internic.net/rfc/rfc2001.txt 541 [RFC2001.bis] Allman, M., Paxson, V., Stevens, W., "TCP Congestion 542 Control". Work in progress draft-ietf-cong-control-01.txt, to 543 update RFC2001. 545 [RFC2018] Mathis, M., Mahdavi, J. Floyd, S., Romanow, A., "TCP 546 Selective Acknowledgment Options", 1996, Obtain via: 547 ftp://ds.internic.net/rfc/rfc2018.txt 549 [RFC2330] Paxson, V., Almes, G., Mahdavi, J., Mathis, M., 550 "Framework for IP Performance Metrics" , 1998, Obtain via: 551 ftp://ds.internic.net/rfc/rfc2330.txt 553 [Ste94] Stevens, W., "TCP/IP Illustrated, Volume 1: The 554 Protocols", Addison-Wesley, 1994. 556 Author's Addresses 558 Matt Mathis 559 Pittsburgh Supercomputing Center 560 4400 Fifth Ave. 561 Pittsburgh PA 15213 562 mathis@psc.edu 563 http://www.psc.edu/~mathis 565 Mark Allman 566 NASA Lewis Research Center/Sterling Software 567 21000 Brookpark Rd. MS 54-2 568 Cleveland, OH 44135 569 216-433-6586 570 mallman@lerc.nasa.gov 571 http://gigahertz.lerc.nasa.gov/~mallman