idnits 2.17.1 draft-ietf-tcpm-rfc3782-bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 266 instances of too long lines in the document, the longest one being 20 characters in excess of 72. ** The abstract seems to contain references ([RFC5681], [RFC2883], [RFC3782], [Hoe95]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document obsoletes RFC3782, but the abstract doesn't seem to directly say this. It does mention RFC3782 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 14, 2011) is 4785 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) -- Obsolete informational reference (is this intentional?): RFC 2001 (ref. 'F98') (Obsoleted by RFC 2581) -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 2582 (Obsoleted by RFC 3782) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 4 errors (**), 0 flaws (~~), 3 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Henderson 3 Internet-Draft Boeing 4 Obsoletes: 3782 (if approved) S. Floyd 5 Intended status: Standards Track ICSI 6 Expires: September 15, 2011 A. Gurtov 7 HIIT 8 Y. Nishida 9 WIDE Project 10 March 14, 2011 12 The NewReno Modification to TCP's Fast Recovery Algorithm 13 draft-ietf-tcpm-rfc3782-bis-01.txt 15 Abstract 17 RFC 5681 [RFC5681] documents the following four intertwined TCP 18 congestion control algorithms: Slow Start, Congestion Avoidance, Fast 19 Retransmit, and Fast Recovery. RFC 5681 explicitly allows 20 certain modifications of these algorithms, including modifications 21 that use the TCP Selective Acknowledgement (SACK) option [RFC2883], 22 and modifications that respond to "partial acknowledgments" (ACKs 23 which cover new data, but not all the data outstanding when loss was 24 detected) in the absence of SACK. This document describes a specific 25 algorithm for responding to partial acknowledgments, referred to as 26 NewReno. This response to partial acknowledgments was first proposed 27 by Janey Hoe in [Hoe95]. 29 The purpose of this revision from [RFC3782] is to make errata changes 30 and to adopt a proposal from Yoshifumi Nishida to slightly increase 31 the minimum window size after Fast Recovery from one to two segments, 32 to improve performance when the receiver uses delayed acknowledgments. 34 Status of this Memo 36 This Internet-Draft is submitted to IETF in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on September 15, 2011. 50 Copyright Notice 52 Copyright (c) 2011 IETF Trust and the persons identified as 53 the document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 This document may contain material from IETF Documents or IETF 66 Contributions published or made publicly available before November 67 10, 2008. The person(s) controlling the copyright in some of this 68 material may not have granted the IETF Trust the right to allow 69 modifications of such material outside the IETF Standards Process. 70 Without obtaining an adequate license from the person(s) controlling 71 the copyright in such materials, this document may not be modified 72 outside the IETF Standards Process, and derivative works of it may 73 not be created outside the IETF Standards Process, except to format 74 it for publication as an RFC or to translate it into languages other 75 than English. 77 1. Introduction 79 For the typical implementation of the TCP Fast Recovery algorithm 80 described in [RFC5681] (first implemented in the 1990 BSD Reno 81 release, and referred to as the Reno algorithm in [FF96]), the TCP 82 data sender only retransmits a packet after a retransmit timeout has 83 occurred, or after three duplicate acknowledgments have arrived 84 triggering the Fast Retransmit algorithm. A single retransmit 85 timeout might result in the retransmission of several data packets, 86 but each invocation of the Fast Retransmit algorithm in RFC 5681 87 leads to the retransmission of only a single data packet. 89 Two problems arise with Reno TCP when multiple packet losses occur 90 in a single window. First, Reno will often take a timeout, as 91 has been documented in [Hoe95]. Second, even if a retransmission 92 timeout is avoided, multiple fast retransmits and window reductions 93 can occur, as documented in [F94]. When multiple packet losses 94 occur, if the SACK option [RFC2883] is available, the TCP sender 95 has the information to make intelligent decisions about which packets 96 to retransmit and which packets not to retransmit during Fast 97 Recovery. This document applies to TCP connections that are 98 unable to use the TCP Selective Acknowledgement (SACK) option, 99 either because the option is not locally supported or 100 because the TCP peer did not indicate a willingness to use SACK. 102 In the absence of SACK, there is little information available to the 103 TCP sender in making retransmission decisions during Fast 104 Recovery. From the three duplicate acknowledgments, the sender 105 infers a packet loss, and retransmits the indicated packet. After 106 this, the data sender could receive additional duplicate 107 acknowledgments, as the data receiver acknowledges additional data 108 packets that were already in flight when the sender entered Fast 109 Retransmit. 111 In the case of multiple packets dropped from a single window of data, 112 the first new information available to the sender comes when the 113 sender receives an acknowledgment for the retransmitted packet (that 114 is, the packet retransmitted when Fast Retransmit was first 115 entered). If there is a single packet drop and no reordering, then the 116 acknowledgment for this packet will acknowledge all of the packets 117 transmitted before Fast Retransmit was entered. However, if there 118 are multiple packet drops, then the acknowledgment for the 119 retransmitted packet will acknowledge some but not all of the packets 120 transmitted before the Fast Retransmit. We call this acknowledgment 121 a partial acknowledgment. 123 Along with several other suggestions, [Hoe95] suggested that during 124 Fast Recovery the TCP data sender responds to a partial 125 acknowledgment by inferring that the next in-sequence packet has been 126 lost, and retransmitting that packet. This document describes a 127 modification to the Fast Recovery algorithm in RFC 5681 that 128 incorporates a response to partial acknowledgments received during 129 Fast Recovery. We call this modified Fast Recovery algorithm 130 NewReno, because it is a slight but significant variation of the 131 basic Reno algorithm in RFC 5681. This document does not discuss the 132 other suggestions in [Hoe95] and [Hoe96], such as a change to the 133 ssthresh parameter during Slow-Start, or the proposal to send a new 134 packet for every two duplicate acknowledgments during Fast 135 Recovery. The version of NewReno in this document also draws on other 136 discussions of NewReno in the literature [LM97, Hen98]. 138 We do not claim that the NewReno version of Fast Recovery described 139 here is an optimal modification of Fast Recovery for responding to 140 partial acknowledgments, for TCP connections that are unable to use 141 SACK. Based on our experiences with the NewReno modification in the 142 NS simulator [NS] and with numerous implementations of NewReno, we 143 believe that this modification improves the performance of the Fast 144 Retransmit and Fast Recovery algorithms in a wide variety of 145 scenarios. 147 2. Terminology and Definitions 149 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 150 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 151 and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 152 [RFC2119]. This RFC indicates requirement levels for compliant TCP 153 implementations implementing the NewReno Fast Retransmit and Fast 154 Recovery algorithms described in this document. 156 This document assumes that the reader is familiar with the terms 157 SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and 158 FLIGHT SIZE (FlightSize) defined in [RFC5681]. FLIGHT SIZE is 159 defined as in [RFC5681] as follows: 161 FLIGHT SIZE: 162 The amount of data that has been sent but not yet cumulatively 163 acknowledged. 165 3. The Fast Retransmit and Fast Recovery Algorithms in NewReno 167 The basic idea of these extensions to the Fast Retransmit and 168 Fast Recovery algorithms described in [RFC5681] is as follows. 169 The TCP sender can infer, from the arrival of duplicate 170 acknowledgments, whether multiple losses in the same window of 171 data have most likely occurred, and avoid taking a retransmit 172 timeout or making multiple congestion window reductions due to 173 such an event. 175 The standard implementation of the Fast Retransmit and Fast Recovery 176 algorithms is given in [RFC5681]. This section specifies the basic 177 NewReno algorithm. Section 4 describes heuristics for processing 178 duplicate acknowledgments after a retransmission timeout. Sections 179 5 and 6 provide some guidance to implementors based on experience 180 with NewReno implementations. Several appendices provide more 181 background information and describe variations that an implementor 182 may want to consider when tuning performance for certain network 183 scenarios. 185 The NewReno modification applies to the Fast Recovery procedure that 186 begins when three duplicate ACKs are received and ends when either a 187 retransmission timeout occurs or an ACK arrives that acknowledges all 188 of the data up to and including the data that was outstanding when 189 the Fast Recovery procedure began. 191 The NewReno algorithm specified in this document extends the 192 implementation in [RFC5681] by introducing a variable specified as 193 "recover" whose initial value is the initial send sequence number. 194 This new variable is used by the sender to record the send sequence 195 number that must be acknowledged before the Fast Recovery 196 procedure is declared to be over. This variable is used below 197 in step 1, in the response to a partial or new 198 acknowledgment in step 5, and in modifications to step 1 and the 199 addition of step 6 for avoiding multiple Fast Retransmits caused by 200 the retransmission of packets already received by the receiver. 202 1) Three duplicate ACKs: 203 When the third duplicate ACK is received and the sender is not 204 already in the Fast Recovery procedure, check to see if the 205 Cumulative Acknowledgment field covers more than 206 "recover". If so, go to Step 1A. Otherwise, go to Step 1B. 208 1A) Invoking Fast Retransmit: 209 If so, then set ssthresh to no more than the value given in 210 equation 1 below. (This is equation 4 from [RFC5681]). 212 ssthresh = max (FlightSize / 2, 2*SMSS) (1) 214 In addition, record the highest sequence number transmitted in 215 the variable "recover", and go to Step 2. 217 1B) Not invoking Fast Retransmit: 218 Do not enter the Fast Retransmit and Fast Recovery procedure. In 219 particular, do not change ssthresh, do not go to Step 2 to 220 retransmit the "lost" segment, and do not execute Step 3 upon 221 subsequent duplicate ACKs. 223 2) Entering Fast Retransmit: 224 Retransmit the lost segment and set cwnd to ssthresh plus 225 3*SMSS. This artificially "inflates" the congestion window by the 226 number of segments (three) that have left the network and the 227 receiver has buffered. 229 3) Fast Recovery: 230 For each additional duplicate ACK received while in Fast 231 Recovery, increment cwnd by SMSS. This artificially inflates 232 the congestion window in order to reflect the additional segment 233 that has left the network. 235 4) Fast Recovery, continued: 236 Transmit a segment, if allowed by the new value of cwnd and the 237 receiver's advertised window. 239 5) When an ACK arrives that acknowledges new data, this ACK could be 240 the acknowledgment elicited by the retransmission from step 2, or 241 elicited by a later retransmission. 243 Full acknowledgments: 244 If this ACK acknowledges all of the data up to and including 245 "recover", then the ACK acknowledges all the intermediate 246 segments sent between the original transmission of the lost 247 segment and the receipt of the third duplicate ACK. Set cwnd to 248 either (1) min (ssthresh, max(FlightSize, SMSS) + SMSS) or 249 (2) ssthresh, where ssthresh is the value set in step 1; this is 250 termed "deflating" the window. (We note that "FlightSize" in step 1 251 referred to the amount of data outstanding in step 1, when Fast 252 Recovery was entered, while "FlightSize" in step 5 refers to the 253 amount of data outstanding in step 5, when Fast Recovery is 254 exited.) If the second option is selected, the implementation 255 is encouraged to take measures to avoid a possible burst of 256 data, in case the amount of data outstanding in the network is 257 much less than the new congestion window allows. A simple mechanism 258 is to limit the number of data packets that can be sent in response 259 to a single acknowledgment. Exit the Fast Recovery procedure. 261 Partial acknowledgments: 262 If this ACK does *not* acknowledge all of the data up to and 263 including "recover", then this is a partial ACK. In this case, 264 retransmit the first unacknowledged segment. Deflate the 265 congestion window by the amount of new data acknowledged by the 266 cumulative acknowledgment field. If the partial ACK 267 acknowledges at least one SMSS of new data, then add back SMSS 268 bytes to the congestion window. As in Step 3, this artificially 269 inflates the congestion window in order to reflect the additional 270 segment that has left the network. Send a new segment if 271 permitted by the new value of cwnd. This "partial window 272 deflation" attempts to ensure that, when Fast Recovery eventually 273 ends, approximately ssthresh amount of data will be outstanding 274 in the network. Do not exit the Fast Recovery procedure (i.e., 275 if any duplicate ACKs subsequently arrive, execute Steps 3 and 276 4 above). 278 For the first partial ACK that arrives during Fast Recovery, also 279 reset the retransmit timer. Timer management is discussed in 280 more detail in Section 4. 282 6) Retransmit timeouts: 283 After a retransmit timeout, record the highest sequence number 284 transmitted in the variable "recover" and exit the Fast 285 Recovery procedure if applicable. 287 Step 1 specifies a check that the Cumulative Acknowledgment field 288 covers more than "recover". Because the acknowledgment field 289 contains the sequence number that the sender next expects to receive, 290 the acknowledgment "ack_number" covers more than "recover" when: 292 ack_number - 1 > recover; 294 i.e., at least one byte more of data is acknowledged beyond the 295 highest byte that was outstanding when Fast Retransmit was last 296 entered. 298 Note that in Step 5, the congestion window is deflated after a 299 partial acknowledgment is received. The congestion window was 300 likely to have been inflated considerably when the partial 301 acknowledgment was received. In addition, depending on the original 302 pattern of packet losses, the partial acknowledgment might 303 acknowledge nearly a window of data. In this case, if the congestion 304 window was not deflated, the data sender might be able to send nearly 305 a window of data back-to-back. 307 This document does not specify the sender's response to duplicate 308 ACKs when the Fast Retransmit/Fast Recovery algorithm is not 309 invoked. This is addressed in other documents, such as those 310 describing the Limited Transmit procedure [RFC3042]. This document 311 also does not address issues of adjusting the duplicate acknowledgment 312 threshold, but assumes the threshold specified in the IETF standards; 313 the current standard is RFC 5681, which specifies a threshold of three 314 duplicate acknowledgments. 316 As a final note, we would observe that in the absence of the SACK 317 option, the data sender is working from limited information. When 318 the issue of recovery from multiple dropped packets from a single 319 window of data is of particular importance, the best alternative 320 would be to use the SACK option. 322 4. Handling Duplicate Acknowledgments After A Timeout 324 After each retransmit timeout, the highest sequence number 325 transmitted so far is recorded in the variable "recover". 326 If, after a retransmit timeout, the TCP data sender retransmits three 327 consecutive packets that have already been received by the data 328 receiver, then the TCP data sender will receive three duplicate 329 acknowledgments that do not cover more than "recover". In this 330 case, the duplicate acknowledgments are not an indication of a new 331 instance of congestion. They are simply an indication that the 332 sender has unnecessarily retransmitted at least three packets. 334 However, when a retransmitted packet is itself dropped, the sender 335 can also receive three duplicate acknowledgments that do not cover 336 more than "recover". In this case, the sender would have been 337 better off if it had initiated Fast Retransmit. For a TCP that 338 implements the algorithm specified in Section 3 of this document, the 339 sender does not infer a packet drop from duplicate acknowledgments 340 in this scenario. As always, the retransmit timer is the backup 341 mechanism for inferring packet loss in this case. 343 There are several heuristics, based on timestamps or on the amount of 344 advancement of the cumulative acknowledgment field, that allow the 345 sender to distinguish, in some cases, between three duplicate 346 acknowledgments following a retransmitted packet that was dropped, 347 and three duplicate acknowledgments from the unnecessary 348 retransmission of three packets [Gur03, GF04]. The TCP sender MAY use 349 such a heuristic to decide to invoke a Fast Retransmit in some cases, 350 even when the three duplicate acknowledgments do not cover more than 351 "recover". 353 For example, when three duplicate acknowledgments are caused by the 354 unnecessary retransmission of three packets, this is likely to be 355 accompanied by the cumulative acknowledgment field advancing by at 356 least four segments. Similarly, a heuristic based on timestamps uses 357 the fact that when there is a hole in the sequence space, the 358 timestamp echoed in the duplicate acknowledgment is the timestamp of 359 the most recent data packet that advanced the cumulative 360 acknowledgment field [RFC1323]. If timestamps are used, and the 361 sender stores the timestamp of the last acknowledged segment, then 362 the timestamp echoed by duplicate acknowledgments can be used to 363 distinguish between a retransmitted packet that was dropped and 364 three duplicate acknowledgments from the unnecessary 365 retransmission of three packets. 367 4.1. ACK Heuristic 369 If the ACK-based heuristic is used, then following the advancement of 370 the cumulative acknowledgment field, the sender stores the value of 371 the previous cumulative acknowledgment as prev_highest_ack, and stores 372 the latest cumulative ACK as highest_ack. In addition, the following 373 step is performed if Step 1 in Section 3 fails, before proceeding to 374 Step 1B. 376 1*) If the Cumulative Acknowledgment field didn't cover more than 377 "recover", check to see if the congestion window is greater 378 than SMSS bytes and the difference between highest_ack and 379 prev_highest_ack is at most 4*SMSS bytes. If true, duplicate 380 ACKs indicate a lost segment (proceed to Step 1A in Section 381 3). Otherwise, duplicate ACKs likely result from unnecessary 382 retransmissions (proceed to Step 1B in Section 3). 384 The congestion window check serves to protect against fast retransmit 385 immediately after a retransmit timeout. 387 If several ACKs are lost, the sender can see a jump in the cumulative 388 ACK of more than three segments, and the heuristic can fail. 389 RFC 5681 recommends that a receiver should 390 send duplicate ACKs for every out-of-order data packet, such as a 391 data packet received during Fast Recovery. The ACK heuristic is more 392 likely to fail if the receiver does not follow this advice, because 393 then a smaller number of ACK losses are needed to produce a 394 sufficient jump in the cumulative ACK. 396 4.2. Timestamp Heuristic 398 If this heuristic is used, the sender stores the timestamp of the 399 last acknowledged segment. In addition, the second paragraph of step 400 1 in Section 3 is replaced as follows: 402 1**) If the Cumulative Acknowledgment field didn't cover more than 403 "recover", check to see if the echoed timestamp in the last 404 non-duplicate acknowledgment equals the 405 stored timestamp. If true, duplicate ACKs indicate a lost 406 segment (proceed to Step 1A in Section 3). Otherwise, duplicate 407 ACKs likely result from unnecessary retransmissions (proceed 408 to Step 1B in Section 3). 410 The timestamp heuristic works correctly, both when the receiver echoes 411 timestamps as specified by [RFC1323], and by its revision attempts. 412 However, if the receiver arbitrarily echoes timestamps, the heuristic 413 can fail. The heuristic can also fail if a timeout was spurious and 414 returning ACKs are not from retransmitted segments. This can be 415 prevented by detection algorithms such as [RFC3522]. 417 5. Implementation Issues for the Data Receiver 419 [RFC5681] specifies that "Out-of-order data segments SHOULD be 420 acknowledged immediately, in order to accelerate loss recovery." 421 Neal Cardwell has noted that some data receivers do not send an 422 immediate acknowledgment when they send a partial acknowledgment, 423 but instead wait first for their delayed acknowledgment timer to 424 expire [C98]. As [C98] notes, this severely limits the potential 425 benefit of NewReno by delaying the receipt of the partial 426 acknowledgment at the data sender. Echoing RFC 5681, our 427 recommendation is that the data receiver send an immediate 428 acknowledgment for an out-of-order segment, even when that 429 out-of-order segment fills a hole in the buffer. 431 6. Implementation Issues for the Data Sender 433 In Section 3, Step 5 above, it is noted that implementations should 434 take measures to avoid a possible burst of data when leaving Fast 435 Recovery, in case the amount of new data that the sender is eligible 436 to send due to the new value of the congestion window is large. This 437 can arise during NewReno when ACKs are lost or treated as pure window 438 updates, thereby causing the sender to underestimate the number of 439 new segments that can be sent during the recovery procedure. 440 Specifically, bursts can occur when the FlightSize is much less than 441 the new congestion window when exiting from Fast Recovery. One 442 simple mechanism to avoid a burst of data when leaving Fast Recovery 443 is to limit the number of data packets that can be sent in response 444 to a single acknowledgment. (This is known as "maxburst_" in the ns 445 simulator.) Other possible mechanisms for avoiding bursts include 446 rate-based pacing, or setting the slow-start threshold to the 447 resultant congestion window and then resetting the congestion window 448 to FlightSize. A recommendation on the general mechanism to avoid 449 excessively bursty sending patterns is outside the scope of this 450 document. 452 An implementation may want to use a separate flag to record whether 453 or not it is presently in the Fast Recovery procedure. The use of 454 the value of the duplicate acknowledgment counter for this purpose is 455 not reliable because it can be reset upon window updates and 456 out-of-order acknowledgments. 458 When updating the Cumulative Acknowledgment field outside of 459 Fast Recovery, the "recover" state variable may also need to be 460 updated in order to continue to permit possible entry into Fast 461 Recovery (Section 3, step 1). This issue arises when an update 462 of the Cumulative Acknowledgment field results in a sequence 463 wraparound that affects the ordering between the Cumulative 464 Acknowledgment field and the "recover" state variable. Entry 465 into Fast Recovery is only possible when the Cumulative 466 Acknowledgment field covers more than the "recover" state variable. 468 It is important for the sender to respond correctly to duplicate ACKs 469 received when the sender is no longer in Fast Recovery (e.g., because 470 of a Retransmit Timeout). The Limited Transmit procedure [RFC3042] 471 describes possible responses to the first and second duplicate 472 acknowledgments. When three or more duplicate acknowledgments are 473 received, the Cumulative Acknowledgment field doesn't cover more 474 than "recover", and a new Fast Recovery is not invoked, it is 475 important that the sender not execute the Fast Recovery steps (3) and 476 (4) in Section 3. Otherwise, the sender could end up in a chain of 477 spurious timeouts. We mention this only because several NewReno 478 implementations had this bug, including the implementation in the NS 479 simulator. 481 It has been observed that some TCP implementations enter a slow start 482 or congestion avoidance window updating algorithm immediately after 483 the cwnd is set by the equation found in (Section 3, step 5), even 484 without a new external event generating the cwnd change. Note that 485 after cwnd is set based on the procedure for exiting Fast Recovery 486 (Section 3, step 5), cwnd SHOULD NOT be updated until a further 487 event occurs (e.g., arrival of an ack, or timeout) after this 488 adjustment. 490 7. Security Considerations 492 RFC 5681 discusses general security considerations concerning TCP 493 congestion control. This document describes a specific algorithm 494 that conforms with the congestion control requirements of RFC 5681, 495 and so those considerations apply to this algorithm, too. There are 496 no known additional security concerns for this specific algorithm. 498 8. IANA Considerations 500 This document has no actions for IANA. 502 9. Conclusions 504 This document specifies the NewReno Fast Retransmit and Fast Recovery 505 algorithms for TCP. This NewReno modification to TCP can even be 506 important for TCP implementations that support the SACK option, 507 because the SACK option can only be used for TCP connections when 508 both TCP end-nodes support the SACK option. NewReno performs better 509 than Reno (RFC 5681) in a number of scenarios discussed herein. 511 A number of options to the basic algorithm presented in Section 3 are 512 also described in appendices to this document. These include the 513 handling of the retransmission timer (Appendix A), the response to 514 partial acknowledgments (Appendix B), and whether or not the sender 515 maintains a state variable called "recover" (Appendix C). 516 Our belief is that the differences between these variants of NewReno 517 are small compared to the differences between Reno and NewReno. 518 That is, the important thing is to implement NewReno instead of Reno, 519 for a TCP connection without SACK; it is less important exactly 520 which of the variants of NewReno is implemented. 522 10. Acknowledgments 524 Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Jeffrey Hsu, 525 Vern Paxson, Kacheong Poon, Keyur Shah, and Bernie Volz for detailed 526 feedback on this document or on its precursor, RFC 2582. Jeffrey 527 Hsu provided clarifications on the handling of the recover variable 528 that were applied to RFC 3782 as errata, and now are in Section 8 529 of this document. Yoshifumi Nishida contributed a modification 530 to the fast recovery algorithm to account for the case in which 531 flightsize is 0 when the TCP sender leaves fast recovery, and the 532 TCP receiver uses delayed acknowledgments. Alexander Zimmermann 533 provided several suggestions to improve the clarity of the document. 535 11. References 537 11.1. Normative References 539 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 540 Requirement Levels", BCP 14, RFC 2119, March 1997. 542 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 543 Timer", RFC 2988, November 2000. 545 [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion 546 Control", RFC 5681, September 2009. 548 11.2. Informative References 550 [C98] Cardwell, N., "delayed ACKs for retransmitted packets: ouch!". 551 November 1998, Email to the tcpimpl mailing list, Message-ID 552 "Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu", 553 archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". 555 [F98] Floyd, S., Revisions to RFC 2001, "Presentation to the TCPIMPL 556 Working Group", August 1998. URLs 557 "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.ps" and 558 "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.pdf". 560 [F03] Floyd, S., "Moving NewReno from Experimental to Proposed 561 Standard? Presentation to the TSVWG Working Group", March 2003. 562 URLs "http://www.icir.org/floyd/talks/newreno-Mar03.ps" and 563 "http://www.icir.org/floyd/talks/newreno-Mar03.pdf". 565 [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of Tahoe, 566 Reno and SACK TCP", Computer Communication Review, July 1996. URL 567 "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z". 569 [F94] Floyd, S., "TCP and Successive Fast Retransmits", Technical 570 report, October 1994. URL 571 "ftp://ftp.ee.lbl.gov/papers/fastretrans.ps". 573 [GF04] Gurtov, A. and S. Floyd, "Resolving Acknowledgment Ambiguity 574 in non-SACK TCP", Next Generation Teletraffic and 575 Wired/Wireless Advanced Networking (NEW2AN'04), February 576 2004. URL "http://www.cs.helsinki.fi/u/gurtov/papers/ 577 heuristics.html". 579 [Gur03] Gurtov, A., "[Tsvwg] resolving the problem of unnecessary fast 580 retransmits in go-back-N", email to the tsvwg mailing list, message 581 ID <3F25B467.9020609@cs.helsinki.fi>, July 28, 2003. URL 582 "http://www1.ietf.org/mail-archive/working-groups/tsvwg/current/msg04334.html". 584 [Hen98] Henderson, T., Re: NewReno and the 2001 Revision. September 585 1998. Email to the tcpimpl mailing list, Message ID 586 "Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS.Berkeley.EDU", 587 archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". 589 [Hoe95] Hoe, J., "Startup Dynamics of TCP's Congestion Control and 590 Avoidance Schemes", Master's Thesis, MIT, 1995. 592 [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion 593 Control Scheme for TCP", ACM SIGCOMM, August 1996. URL 594 "http://www.acm.org/sigcomm/sigcomm96/program.html". 596 [LM97] Lin, D. and R. Morris, "Dynamics of Random Early Detection", 597 SIGCOMM 97, September 1997. URL 598 "http://www.acm.org/sigcomm/sigcomm97/program.html". 600 [NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/". 602 [PF01] Padhye, J. and S. Floyd, "Identifying the TCP Behavior of Web 603 Servers", June 2001, SIGCOMM 2001. 605 [RFC1323] Jacobson, V., Braden, R. and D. Borman, "TCP Extensions for 606 High Performance", RFC 1323, May 1992. 608 [RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to 609 TCP's Fast Recovery Algorithm", RFC 2582, April 1999. 611 [RFC2883] Floyd, S., J. Mahdavi, M. Mathis, and M. Podolsky, "The 612 Selective Acknowledgment (SACK) Option for TCP, RFC 2883, July 2000. 614 [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's 615 Loss Recovery Using Limited Transmit", RFC 3042, January 2001. 617 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for 618 TCP", RFC 3522, April 2003. 620 [RFC3782] Floyd, S., T. Henderson, and A. Gurtov, "The NewReno 621 Modification to TCP's Fast Recovery Algorithm", RFC 3782, April 2004. 623 Appendix A. Resetting the Retransmit Timer in Response to Partial 624 Acknowledgments 626 One possible variant to the response to partial acknowledgments 627 specified in Section 3 concerns when to reset the retransmit timer 628 after a partial acknowledgment. The algorithm in Section 3, Step 5, 629 resets the retransmit timer only after the first partial ACK. In 630 this case, if a large number of packets were dropped from a window of 631 data, the TCP data sender's retransmit timer will ultimately expire, 632 and the TCP data sender will invoke Slow-Start. (This is illustrated 633 on page 12 of [F98].) We call this the Impatient variant of NewReno. 634 We note that the Impatient variant in Section 3 doesn't follow the 635 recommended algorithm in RFC 2988 of restarting the retransmit timer 636 after every packet transmission or retransmission (step 5.1 of 637 [RFC2988]). 639 In contrast, the NewReno simulations in [FF96] illustrate the 640 algorithm described above with the modification that the retransmit 641 timer is reset after each partial acknowledgment. We call this the 642 Slow-but-Steady variant of NewReno. In this case, for a window with 643 a large number of packet drops, the TCP data sender retransmits at 644 most one packet per roundtrip time. (This behavior is illustrated in 645 the New-Reno TCP simulation of Figure 5 in [FF96], and on page 11 of 646 [F98]). 648 When N packets have been dropped from a window of data for a large 649 value of N, the Slow-but-Steady variant can remain in Fast Recovery 650 for N round-trip times, retransmitting one more dropped packet each 651 round-trip time; for these scenarios, the Impatient variant gives a 652 faster recovery and better performance. 653 The Impatient variant can be particularly important for TCP 654 connections with large congestion windows. 656 One can also construct scenarios where the Slow-but-Steady variant 657 gives better performance than the Impatient variant. As an example, 658 this occurs when only a small number of packets are dropped, the RTO 659 is sufficiently small that the retransmit timer expires, and 660 performance would have been better without a retransmit timeout. 662 The Slow-but-Steady variant can also achieve higher goodput than the 663 Impatient variant, by avoiding unnecessary retransmissions. This 664 could be of special interest for cellular links, where every 665 transmission costs battery power and money. The 666 Slow-but-Steady variant can also be more robust to delay variation in 667 the network, where a delay spike might force the Impatient variant into 668 a timeout and go-back-N recovery. 670 Neither of the two variants discussed above are optimal. Our 671 recommendation is for the Impatient variant, as specified in Section 672 3 of this document, because of the poor performance of the 673 Slow-but-Steady variant for TCP connections with large congestion 674 windows. 676 One possibility for a more optimal algorithm would be one that 677 recovered from multiple packet drops as quickly as does slow-start, 678 while resetting the retransmit timers after each partial 679 acknowledgment, as described in the section below. We note, 680 however, that there is a limitation to the potential performance in 681 this case in the absence of the SACK option. 683 Appendix B. Retransmissions after a Partial Acknowledgment 685 One possible variant to the response to partial acknowledgments 686 specified in Section 3 would be to retransmit more than one packet 687 after each partial acknowledgment, and to reset the retransmit timer 688 after each retransmission. The algorithm specified in Section 3 689 retransmits a single packet after each partial acknowledgment. This 690 is the most conservative alternative, in that it is the least likely 691 to result in an unnecessarily-retransmitted packet. A variant that 692 would recover faster from a window with many packet drops would be to 693 effectively Slow-Start, retransmitting two packets after each partial 694 acknowledgment. Such an approach would take less than N roundtrip 695 times to recover from N losses [Hoe96]. However, in the absence of 696 SACK, recovering as quickly as slow-start introduces the likelihood 697 of unnecessarily retransmitting packets, and this could significantly 698 complicate the recovery mechanisms. 700 We note that the response to partial acknowledgments specified in 701 Section 3 of this document and in RFC 2582 differs from the response 702 in [FF96], even though both approaches only retransmit one packet in 703 response to a partial acknowledgment. Step 5 of Section 3 specifies 704 that the TCP sender responds to a partial ACK by deflating the 705 congestion window by the amount of new data acknowledged, adding 706 back SMSS bytes if the partial ACK acknowledges at least SMSS bytes 707 of new data, and sending a new segment if permitted by the new value 708 of cwnd. Thus, only one previously-sent packet is retransmitted in 709 response to each partial acknowledgment, but additional new packets 710 might be transmitted as well, depending on the amount of new data 711 acknowledged by the partial acknowledgment. In contrast, the 712 variant of NewReno illustrated in [FF96] simply set the congestion 713 window to ssthresh when a partial acknowledgment was received. The 714 approach in [FF96] is more conservative, and does not attempt to 715 accurately track the actual number of outstanding packets after a 716 partial acknowledgment is received. While either of these 717 approaches gives acceptable performance, the variant specified in 718 Section 3 recovers more smoothly when multiple packets are dropped 719 from a window of data. 721 Appendix C. Avoiding Multiple Fast Retransmits 723 This appendix describes the motivation for the sender's state 724 variable "recover". 726 In the absence of the SACK option or timestamps, a duplicate 727 acknowledgment carries no information to identify the data packet or 728 packets at the TCP data receiver that triggered that duplicate 729 acknowledgment. In this case, the TCP data sender is unable to 730 distinguish between a duplicate acknowledgment that results from a 731 lost or delayed data packet, and a duplicate acknowledgment that 732 results from the sender's unnecessary retransmission of a data packet 733 that had already been received at the TCP data receiver. Because of 734 this, with the Retransmit and Fast Recovery algorithms in Reno TCP, 735 multiple segment losses from a single window of data can sometimes 736 result in unnecessary multiple Fast Retransmits (and multiple 737 reductions of the congestion window) [F94]. 739 With the Fast Retransmit and Fast Recovery algorithms in Reno TCP, 740 the performance problems caused by multiple Fast Retransmits are 741 relatively minor compared to the potential problems with Tahoe TCP, 742 which does not implement Fast Recovery. Nevertheless, unnecessary 743 Fast Retransmits can occur with Reno TCP unless some explicit 744 mechanism is added to avoid this, such as the use of the "recover" 745 variable. (This modification is called "bugfix" in [F98], and is 746 illustrated on pages 7 and 9 of that document. Unnecessary Fast 747 Retransmits for Reno without "bugfix" is illustrated on page 6 of 749 [F98].) 751 Section 3 of [RFC2582] defined a default variant of NewReno TCP that 752 did not use the variable "recover", and did not check if duplicate 753 ACKs cover the variable "recover" before invoking Fast Retransmit. 754 With this default variant from RFC 2582, the problem of multiple Fast 755 Retransmits from a single window of data can occur after a Retransmit 756 Timeout (as in page 8 of [F98]) or in scenarios with reordering. 757 RFC 2582 also defined Careful and Less Careful variants of the NewReno 758 algorithm, and recommended the Careful variant. 760 The algorithm specified in Section 3 of this document corresponds to 761 the Careful variant of NewReno TCP from RFC 2582, and eliminates the 762 problem of multiple Fast Retransmits. This algorithm uses the 763 variable "recover", whose initial value is the initial send sequence 764 number. After each retransmit timeout, the highest sequence number 765 transmitted so far is recorded in the variable "recover". 767 Appendix D. Simulations 769 This section provides pointers to simulation scripts available in 770 the NS simulator that reproduce behavior described above. 772 In Section 3, a simple mechanism is described to limit the number of 773 data packets that can be sent in response to a single acknowledgment. 774 This is known as "maxburst_" in the NS simulator. 776 Simulations with NewReno are illustrated with the validation test 777 "tcl/test/test-all-newreno" in the NS simulator. The command 778 "../../ns test-suite-newreno.tcl reno" shows a simulation with Reno 779 TCP, illustrating the data sender's lack of response to a partial 780 acknowledgment. In contrast, the command "../../ns 781 test-suite-newreno.tcl newreno_B" shows a simulation with the same 782 scenario using the NewReno algorithms described in this paper. 784 Regarding the handling of duplicate acknowledgments after a timeout, 785 the congestion window check serves to protect against fast retransmit 786 immediately after a retransmit timeout, similar to the 787 "exitFastRetrans_" variable in NS. Examples of applying the ACK 788 heuristic (Section 4) are in validation tests "./test-all-newreno 789 newreno_rto_loss_ack" and "./test-all-newreno newreno_rto_dup_ack" in 790 directory "tcl/test" of the NS simulator. 791 If several ACKs are lost, the sender can see a jump in the cumulative 792 ACK of more than three segments, and the heuristic can fail. A 793 validation test for this scenario is "./test-all-newreno 794 newreno_rto_loss_ackf". 796 Examples of applying the timestamp heuristic (Section 4) are in 797 validation tests "./test-all-newreno newreno_rto_loss_tsh" and 798 "./test-all-newreno newreno_rto_dup_tsh". 800 Section 6 described a problem involving possible spurious timeouts, 801 and mentions that this bug existed in the NS simulator. 802 This bug in the NS simulator was fixed in July 2003, 803 with the variable "exitFastRetrans_". 805 Regarding the Slow-but-Steady and Impatient variants described 806 in Appendix A, The tests "ns 807 test-suite-newreno.tcl impatient1" and "ns test-suite-newreno.tcl 808 slow1" in the NS simulator illustrate a scenario in which the 809 Impatient variant performs better than the Slow-but-Steady 810 variant. The Impatient variant can be particularly important for TCP 811 connections with large congestion windows, as illustrated by the tests 812 "ns test-suite-newreno.tcl impatient4" and "ns test-suite-newreno.tcl 813 slow4" in the NS simulator. The tests 814 "ns test-suite-newreno.tcl impatient2" and 815 "ns test-suite-newreno.tcl slow2" in the NS simulator illustrate 816 scenarios in which the Slow-but-Steady variant outperforms the Impatient 817 variant. The tests "ns test-suite-newreno.tcl impatient3" and 818 "ns test-suite-newreno.tcl slow3" in the NS simulator illustrate 819 scenarios in which the Slow-but-Steady variants avoid unnecessary 820 retransmissions. 822 Appendix B describes different policies for partial window deflation. 823 The [FF96] behavior can be seen in the NS 824 simulator by setting the variable "partial_window_deflation_" for 825 "Agent/TCP/Newreno" to 0; the behavior specified in Section 3 is 826 achieved by setting "partial_window_deflation_" to 1. 828 Section 3 of [RFC2582] defined a default variant of NewReno TCP that 829 did not use the variable "recover", and did not check if duplicate 830 ACKs cover the variable "recover" before invoking Fast Retransmit. 831 With this default variant from RFC 2582, the problem of multiple Fast 832 Retransmits from a single window of data can occur after a Retransmit 833 Timeout (as in page 8 of [F98]) or in scenarios with reordering (as 834 An NS validation test "./test-all-newreno newreno5_noBF" in 835 directory "tcl/test" of the NS simulator illustartes the default 836 variant of NewReno TCP that doesn't use the variable "recover"; 837 this gives performance similar to that on page 8 of [F03]. 839 Appendix E. Comparisons between Reno and NewReno TCP 841 As we stated in the introduction, we believe that the NewReno 842 modification described in this document improves the performance of 843 the Fast Retransmit and Fast Recovery algorithms of Reno TCP in a 844 wide variety of scenarios. This has been discussed in some depth in 846 [FF96], which illustrates Reno TCP's poor performance when multiple 847 packets are dropped from a window of data and also illustrates 848 NewReno TCP's good performance in that scenario. 850 We do, however, know of one scenario where Reno TCP gives better 851 performance than NewReno TCP, that we describe here for the sake of 852 completeness. Consider a scenario with no packet loss, but with 853 sufficient reordering so that the TCP sender receives three duplicate 854 acknowledgments. This will trigger the Fast Retransmit and Fast 855 Recovery algorithms. With Reno TCP or with Sack TCP, this will 856 result in the unnecessary retransmission of a single packet, combined 857 with a halving of the congestion window (shown on pages 4 and 6 of 858 [F03]). With NewReno TCP, however, this reordering will also result 859 in the unnecessary retransmission of an entire window of data (shown 860 on page 5 of [F03]). 862 While Reno TCP performs better than NewReno TCP in the presence of 863 reordering, NewReno's superior performance in the presence of 864 multiple packet drops generally outweighs its less optimal 865 performance in the presence of reordering. (Sack TCP is the 866 preferred solution, with good performance in both scenarios.) This 867 document recommends the Fast Retransmit and Fast Recovery algorithms 868 of NewReno TCP instead of those of Reno TCP for those TCP connections 869 that do not support SACK. We would also note that NewReno's Fast 870 Retransmit and Fast Recovery mechanisms are widely deployed in TCP 871 implementations in the Internet today, as documented in [PF01]. For 872 example, tests of TCP implementations in several thousand web servers 873 in 2001 showed that for those TCP connections where the web browser 874 was not SACK-capable, more web servers used the Fast Retransmit and 875 Fast Recovery algorithms of NewReno than those of Reno or Tahoe TCP 876 [PF01]. 878 Appendix F. Changes Relative to RFC 2582 880 The purpose of this document is to advance the NewReno's Fast 881 Retransmit and Fast Recovery algorithms in RFC 2582 to Standards Track. 883 The main change in this document relative to RFC 2582 is to specify 884 the Careful variant of NewReno's Fast Retransmit and Fast Recovery 885 algorithms. The base algorithm described in RFC 2582 did not attempt 886 to avoid unnecessary multiple Fast Retransmits that can occur after a 887 timeout (described in more detail in the section above). However, 888 RFC 2582 also defined "Careful" and "Less Careful" variants that 889 avoid these unnecessary Fast Retransmits, and recommended the Careful 890 variant. This document specifies the previously-named "Careful" 891 variant as the basic version of NewReno. As described below, this 892 algorithm uses a variable "recover", whose initial value is the send 893 sequence number. 895 The algorithm specified in Section 3 checks whether the 896 acknowledgment field of a partial acknowledgment covers *more* than 897 "recover", as defined in Section 3. Another possible variant would be 898 to simply require that the acknowledgment field covers *more than or 899 equal to* "recover" before initiating another Fast Retransmit. We 900 called this the Less Careful variant in RFC 2582. 902 There are two separate scenarios in which the TCP sender could 903 receive three duplicate acknowledgments acknowledging "recover" but 904 no more than "recover". One scenario would be that the data sender 905 transmitted four packets with sequence numbers higher than "recover", 906 that the first packet was dropped in the network, and the following 907 three packets triggered three duplicate acknowledgments 908 acknowledging "recover". The second scenario would be that the 909 sender unnecessarily retransmitted three packets below "recover", and 910 that these three packets triggered three duplicate acknowledgments 911 acknowledging "recover". In the absence of SACK, the TCP sender is 912 unable to distinguish between these two scenarios. 914 For the Careful variant of Fast Retransmit, the data sender would 915 have to wait for a retransmit timeout in the first scenario, but 916 would not have an unnecessary Fast Retransmit in the second 917 scenario. For the Less Careful variant to Fast Retransmit, the data 918 sender would Fast Retransmit as desired in the first scenario, and would 919 unnecessarily Fast Retransmit in the second scenario. This document 920 only specifies the Careful variant in Section 3. Unnecessary Fast 921 Retransmits with the Less Careful variant in scenarios with 922 reordering are illustrated in page 8 of [F03]. 924 The document also specifies two heuristics that the TCP sender MAY 925 use to decide to invoke Fast Retransmit even when the three duplicate 926 acknowledgments do not cover more than "recover". These heuristics, 927 an ACK-based heuristic and a timestamp heuristic, are described in 928 Sections 6.1 and 6.2 respectively. 930 Appendix G. Changes Relative to RFC 3782 932 In [RFC3782], the cwnd after Full ACK reception will be set to 933 (1) min (ssthresh, FlightSize + SMSS) or (2) ssthresh. However, 934 there is a risk in the first logic which results in performance 935 degradation. With the first logic, if FlightSize is zero, the result 936 will be 1 SMSS. This means TCP can transmit only 1 segment at this 937 moment, which can cause delay in ACK transmission at receiver due to 938 delayed ACK algorithm. 940 The FlightSize on Full ACK reception can be zero in some situations. 941 A typical example is where sending window size during fast recovery is 942 small. In this case, the retransmitted packet and new data packets can 943 be transmitted within a short interval. If all these packets 944 successfully arrive, the receiver may generate a Full ACK that 945 acknowledges all outstanding data. Even if window size is not small, 946 loss of ACK packets or receive buffer shortage during fast recovery can 947 also increase the possibility to fall into this situation. 949 The proposed fix in this document ensures that sender TCP transmits at 950 least two segments on Full ACK reception. 952 In addition, errata for RFC3782 (editorial clarification to Section 8 953 of RFC2582, which is now Section 6 of this document) has been applied. 955 Sections 4, 5, and 9-11 of RFC2582 were relocated to appendices of 956 this document since they are non-normative and provide background 957 information and references to simulation results. 959 Appendix H. Document Revision History 961 To be removed upon publication 963 +----------+--------------------------------------------------+ 964 | Revision | Comments | 965 +----------+--------------------------------------------------+ 966 | draft-00 | RFC3782 errata applied, and changes applied from | 967 | | draft-nishida-newreno-modification-02 | 968 +----------+--------------------------------------------------+ 969 | draft-01 | Non-normative sections moved to appendices, | 970 | | editorial clarifications applied as suggested | 971 | | by Alexander Zimmermann. | 972 +----------+--------------------------------------------------+ 974 Authors' Addresses 976 Tom Henderson 977 The Boeing Company 979 EMail: thomas.r.henderson@boeing.com 981 Sally Floyd 982 International Computer Science Institute 984 Phone: +1 (510) 666-2989 985 EMail: floyd@acm.org 986 URL: http://www.icir.org/floyd/ 987 Andrei Gurtov 988 HIIT 989 Helsinki Institute for Information Technology 990 P.O. Box 19215 991 00076 Aalto 992 Finland 994 EMail: gurtov@hiit.fi 996 Yoshifumi Nishida 997 WIDE Project 998 Endo 5322 999 Fujisawa, Kanagawa 252-8520 1000 Japan 1002 Email: nishida@wide.ad.jp