idnits 2.17.1 draft-cheng-tcpm-rack-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 38 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (July 6, 2016) is 2850 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'RFC3517' is mentioned on line 397, but not defined ** Obsolete undefined reference: RFC 3517 (Obsoleted by RFC 6675) == Missing Reference: 'RFC4653' is mentioned on line 447, but not defined == Missing Reference: 'RFC3522' is mentioned on line 466, but not defined == Unused Reference: 'RFC793' is defined on line 536, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 567, but no explicit reference was found in the text == Unused Reference: 'RFC2883' is defined on line 573, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) Summary: 3 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance Working Group Y. Cheng 3 Internet-Draft N. Cardwell 4 Intended status: Experimental Google, Inc 5 Expires: January 7, 2017 July 6, 2016 7 RACK: a time-based fast loss detection algorithm for TCP 8 draft-cheng-tcpm-rack-01 10 Abstract 12 This document presents a new TCP loss detection algorithm called RACK 13 ("Recent ACKnowledgment"). RACK uses the notion of time, instead of 14 packet or sequence counts, to detect losses, for modern TCP 15 implementations that can support per-packet timestamps and the 16 selective acknowledgment (SACK) option. It is intended to replace 17 the conventional DUPACK threshold approach and its variants, as well 18 as other nonstandard approaches. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on January 7, 2017. 37 Copyright Notice 39 Copyright (c) 2016 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 1. Introduction 54 This document presents a new loss detection algorithm called RACK 55 ("Recent ACKnowledgment"). RACK uses the notion of time instead of 56 the conventional packet or sequence counting approaches for detecting 57 losses. RACK deems a packet lost if some packet sent sufficiently 58 later has been delivered. It does this by recording packet 59 transmission times and inferring losses using cumulative 60 acknowledgments or selective acknowledgment (SACK) TCP options. 62 In the last couple of years we have been observing several 63 increasingly common loss and reordering patterns in the Internet: 65 1. Lost retransmissions. Traffic policers [POLICER16] and burst 66 losses often cause retransmissions to be lost again, severely 67 increasing TCP latency. 69 2. Tail drops. Structured request-response traffic turns more 70 losses into tail drops. In such cases, TCP is application- 71 limited, so it cannot send new data to probe losses and has to 72 rely on retransmission timeouts (RTOs). 74 3. Reordering. Link layer protocols (e.g., 802.11 block ACK) or 75 routers' internal load-balancing can deliver TCP packets out of 76 order. The degree of such reordering is usually within the order 77 of the path round trip time. 79 Despite TCP stacks (e.g. Linux) that implement many of the standard 80 and proposed loss detection algorithms 81 [RFC3517][RFC4653][RFC5827][RFC5681][RFC6675][RFC7765][FACK][THIN- 82 STREAM][TLP], we've found that together they do not perform well. 83 The main reason is that many of them are based on the classic rule of 84 counting duplicate acknowledgments [RFC5681]. They can either detect 85 loss quickly or accurately, but not both, especially when the sender 86 is application-limited or under reordering that is unpredictable. 87 And under these conditions none of them can detect lost 88 retransmissions well. 90 Also, these algorithms, including RFCs, rarely address the 91 interactions with other algorithms. For example, FACK may consider a 92 packet is lost while RFC3517 may not. Implementing N algorithms 93 while dealing with N^2 interactions is a daunting task and error- 94 prone. 96 The goal of RACK is to solve all the problems above by replacing many 97 of the loss detection algorithms above with one simpler, and also 98 more effective, algorithm. 100 2. Overview 102 The main idea behind RACK is that if a packet has been delivered out 103 of order, then the packets sent chronologically before that were 104 either lost or reordered. This concept is not fundamentally 105 different from [RFC5681][RFC3517][FACK]. But the key innovation in 106 RACK is to use a per-packet transmission timestamp and widely 107 deployed SACK options to conduct time-based inferences instead of 108 inferring losses with packet or sequence counting approaches. 110 Using a threshold for counting duplicate acknowledgments (i.e., 111 dupthresh) is no longer reliable because of today's prevalent 112 reordering patterns. A common type of reordering is that the last 113 "runt" packet of a window's worth of packet bursts gets delivered 114 first, then the rest arrive shortly after in order. To handle this 115 effectively, a sender would need to constantly adjust the dupthresh 116 to the burst size; but this would risk increasing the frequency of 117 RTOs on real losses. 119 Today's prevalent lost retransmissions also cause problems with 120 packet-counting approaches [RFC5681][RFC3517][FACK], since those 121 approaches depend on reasoning in sequence number space. 122 Retransmissions break the direct correspondence between ordering in 123 sequence space and ordering in time. So when retransmissions are 124 lost, sequence-based approaches are often unable to infer and quickly 125 repair losses that can be deduced with time-based approaches. 127 Instead of counting packets, RACK uses the most recently delivered 128 packet's transmission time to judge if some packets sent previous to 129 that time have "expired" by passing a certain reordering settling 130 window. On each ACK, RACK marks any already-expired packets lost, 131 and for any packets that have not yet expired it waits until the 132 reordering window passes and then marks those lost as well. In 133 either case, RACK can repair the loss without waiting for a (long) 134 RTO. RACK can be applied to both fast recovery and timeout recovery, 135 and can detect losses on both originally transmitted and 136 retransmitted packets, making it a great all-weather recovery 137 mechanism. 139 3. Requirements 141 The reader is expected to be familiar with the definitions given in 142 the TCP congestion control [RFC5681] and selective acknowledgment 144 [RFC2018] RFCs. Familiarity with the conservative SACK-based 145 recovery for TCP [RFC6675] is not expected but helps. 147 RACK has three requirements: 149 1. The connection MUST use selective acknowledgment (SACK) options 150 [RFC2018]. 152 2. For each packet sent, the sender MUST store its most recent 153 transmission time with (at least) millisecond granularity. For 154 round-trip times lower than a millisecond (e.g., intra-datacenter 155 communications) microsecond granularity would significantly help 156 the detection latency but is not required. 158 3. For each packet sent, the sender MUST store whether the packet 159 has been retransmitted or not. 161 We assume that requirement 1 implies the sender keeps a SACK 162 scoreboard, which is a data structure to store selective 163 acknowledgment information on a per-connection basis. For the ease 164 of explaining the algorithm, we use a pseudo-scoreboard that manages 165 the data in sequence number ranges. But the specifics of the data 166 structure are left to the implementor. 168 RACK does not need any change on the receiver. 170 4. Definitions of variables 172 A sender needs to store these new RACK variables: 174 "Packet.xmit_ts" is the time of the last transmission of a data 175 packet, including any retransmissions, if any. The sender needs to 176 record the transmission time for each packet sent and not yet 177 acknowledged. The time MUST be stored at millisecond granularity or 178 finer. 180 "RACK.xmit_ts" is the most recent Packet.xmit_ts among all the 181 packets that were delivered (either cumulatively acknowledged or 182 selectively acknowledged) on the connection. 184 "RACK.end_seq" is the ending TCP sequence number of the packet that 185 was used to record the RACK.xmit_ts above. 187 "RACK.RTT" is the associated RTT measured when RACK.xmit_ts, above, 188 was changed. It is the RTT of the most recently transmitted packet 189 that has been delivered (either cumulatively acknowledged or 190 selectively acknowledged) on the connection. 192 "RACK.reo_wnd" is a reordering window for the connection, computed in 193 the unit of time used for recording packet transmission times. It is 194 used to defer the moment at which RACK marks a packet lost. 196 "RACK.min_RTT" is the estimated minimum round-trip time (RTT) of the 197 connection. 199 Note that the Packet.xmit_ts variable is per packet in flight. The 200 RACK.xmit_ts, RACK.RTT, RACK.reo_wnd, and RACK.min_RTT variables are 201 per connection. 203 5. Algorithm Details 205 5.1. Transmitting a data packet 207 Upon transmitting a new packet or retransmitting an old packet, 208 record the time in Packet.xmit_ts. RACK does not care if the 209 retransmission is triggered by an ACK, new application data, an RTO, 210 or any other means. 212 5.2. Upon receiving an ACK 214 Step 1: Update RACK.min_RTT. 216 Use the RTT measurements obtained in [RFC6298] or [RFC7323] to update 217 the estimated minimum RTT in RACK.min_RTT. The sender can track a 218 simple global minimum of all RTT measurements from the connection, or 219 a windowed min-filtered value of recent RTT measurements. This 220 document does not specify an exact approach. 222 Step 2: Update RACK.reo_wnd. 224 To handle the prevalent small degree of reordering, RACK.reo_wnd 225 serves as an allowance for settling time before marking a packet 226 lost. By default it is 1 millisecond. We RECOMMEND implementing the 227 reordering detection in [REORDER-DETECT][RFC4737] to dynamically 228 adjust the reordering window. When the sender detects packet 229 reordering RACK.reo_wnd MAY be changed to RACK.min_RTT/4. We discuss 230 more about the reordering window in the next section. 232 Step 3: Advance RACK.xmit_ts and update RACK.RTT and RACK.end_seq 234 Given the information provided in an ACK, each packet cumulatively 235 ACKed or SACKed is marked as delivered in the scoreboard. Among all 236 the packets newly ACKed or SACKed in the connection, record the most 237 recent Packet.xmit_ts in RACK.xmit_ts if it is ahead of RACK.xmit_ts. 238 Ignore the packet if any of its TCP sequences has been retransmitted 239 before and either of two condition is true: 241 1. The Timestamp Echo Reply field (TSecr) of the ACK's timestamp 242 option [RFC7323], if available, indicates the ACK was not 243 acknowledging the last retransmission of the packet. 245 2. The packet was last retransmitted less than RACK.min_rtt ago. 246 While it is still possible the packet is spuriously retransmitted 247 because of a recent RTT decrease, we believe that our experience 248 suggests this is a reasonable heuristic. 250 If this ACK causes a change to RACK.xmit_ts then record the RTT and 251 sequence implied by this ACK: 253 RACK.RTT = Now() - RACK.xmit_ts 254 RACK.end_seq = Packet.end_seq 256 Exit here and omit the following steps if RACK.xmit_ts has not 257 changed. 259 Step 4: Detect losses. 261 For each packet that has not been fully SACKed, if RACK.xmit_ts is 262 after Packet.xmit_ts + RACK.reo_wnd, then mark the packet (or its 263 corresponding sequence range) lost in the scoreboard. The rationale 264 is that if another packet that was sent later has been delivered, and 265 the reordering window or "reordering settling time" has already 266 passed, the packet was likely lost. 268 If a packet that was sent later has been delivered, but the 269 reordering window has not passed, then it is not yet safe to deem the 270 given packet lost. Using the basic algorithm above, the sender would 271 wait for the next ACK to further advance RACK.xmit_ts; but this risks 272 a timeout (RTO) if no more ACKs come back (e.g, due to losses or 273 application limit). For timely loss detection, the sender MAY 274 install a "reordering settling" timer set to fire at the earliest 275 moment at which it is safe to conclude that some packet is lost. The 276 earliest moment is the time it takes to expire the reordering window 277 of the earliest unacked packet in flight. 279 This timer expiration value can be derived as follows. As a starting 280 point, we consider that the reordering window has passed if the RACK 281 packet was sent sufficiently after the packet in question, or a 282 sufficient time has elapsed since the RACK packet was S/ACKed, or 283 some combination of the two. More precisely, RACK marks a packet as 284 lost if the reordering window for a packet has elapsed through the 285 sum of: 287 1. delta in transmit time between a packet and the RACK packet 288 2. delta in time between the S/ACK of the RACK packet (RACK.ack_ts) 289 and now 291 So we mark a packet as lost if: 293 RACK.xmit_ts > Packet.xmit_ts AND 294 (RACK.xmit_ts - Packet.xmit_ts) + (now - RACK.ack_ts) > RACK.reo_wnd 296 If we solve this second condition for "now", the moment at which we 297 can declare a packet lost, then we get: 299 now > Packet.xmit_ts + RACK.reo_wnd + (RACK.ack_ts - RACK.xmit_ts) 301 Then (RACK.ack_ts - RACK.xmit_ts) is just the RTT of the packet we 302 used to set RACK.xmit_ts, so this reduces to: 304 now > Packet.xmit_ts + RACK.RTT + RACK.reo_wnd 306 The following pseudocode implements the algorithm above. When an ACK 307 is received or the RACK timer expires, call RACK_detect_loss(). The 308 algorithm includes an additional optimization to break timestamp ties 309 by using the TCP sequence space. The optimization is particularly 310 useful to detect losses in a timely manner with TCP Segmentation 311 Offload, where multiple packets in one TSO blob have identical 312 timestamps. It is also useful when the timestamp clock granularity 313 is close to or longer than the actual round trip time. 315 RACK_detect_loss(): 316 min_timeout = 0 318 For each packet, Packet, in the scoreboard: 319 If Packet is already SACKed, ACKed, 320 or marked lost and not yet retransmitted: 321 Skip to the next packet 323 If Packet.xmit_ts > RACK.xmit_ts: 324 Skip to the next packet 325 If Packet.xmit_ts == RACK.xmit_ts AND // Timestamp tie breaker Packet.end_seq > RACK.end_seq 326 Skip to the next packet 328 timeout = Packet.xmit_ts + RACK.RTT + RACK.reo_wnd + 1 329 If Now() >= timeout 330 Mark Packet lost 331 Else If (min_timeout == 0) or (timeout is before min_timeout): 332 min_timeout = timeout 334 If min_timeout != 0 335 Arm a timer to call RACK_detect_loss() after min_timeout 337 6. Analysis and Discussion 339 6.1. Advantages 341 The biggest advantage of RACK is that every data packet, whether it 342 is an original data transmission or a retransmission, can be used to 343 detect losses of the packets sent prior to it. 345 Example: tail drop. Consider a sender that transmits a window of 346 three data packets (P1, P2, P3), and P1 and P3 are lost. Suppose the 347 transmission of each packet is at least RACK.reo_wnd (1 millisecond 348 by default) after the transmission of the previous packet. RACK will 349 mark P1 as lost when the SACK of P2 is received, and this will 350 trigger the retransmission of P1 as R1. When R1 is cumulatively 351 acknowledged, RACK will mark P3 as lost and the sender will 352 retransmit P3 as R3. This example illustrates how RACK is able to 353 repair certain drops at the tail of a transaction without any timer. 354 Notice that neither the conventional duplicate ACK threshold 355 [RFC5681], nor [RFC6675], nor the Forward Acknowledgment [FACK] 356 algorithm can detect such losses, because of the required packet or 357 sequence count. 359 Example: lost retransmit. Consider a window of three data packets 360 (P1, P2, P3) that are sent; P1 and P2 are dropped. Suppose the 361 transmission of each packet is at least RACK.reo_wnd (1 millisecond 362 by default) after the transmission of the previous packet. When P3 363 is SACKed, RACK will mark P1 and P2 lost and they will be 364 retransmitted as R1 and R2. Suppose R1 is lost again (as a tail 365 drop) but R2 is SACKed; RACK will mark R1 lost for retransmission 366 again. Again, neither the conventional three duplicate ACK threshold 367 approach, nor [RFC6675], nor the Forward Acknowledgment [FACK] 368 algorithm can detect such losses. And such a lost retransmission is 369 very common when TCP is being rate-limited, particularly by token 370 bucket policers with large bucket depth and low rate limit. 371 Retransmissions are often lost repeatedly because standard congestion 372 control requires multiple round trips to reduce the rate below the 373 policed rate. 375 Example: (small) degree of reordering. Consider a common reordering 376 event: a window of packets are sent as (P1, P2, P3). P1 and P2 carry 377 a full payload of MSS octets, but P3 has only a 1-octet payload due 378 to application-limited behavior. Suppose the sender has detected 379 reordering previously (e.g., by implementing the algorithm in 380 [REORDER-DETECT]) and thus RACK.reo_wnd is min_RTT/4. Now P3 is 381 reordered and delivered first, before P1 and P2. As long as P1 and 382 P2 are delivered within min_RTT/4, RACK will not consider P1 and P2 383 lost. But if P1 and P2 are delivered outside the reordering window, 384 then RACK will still falsely mark P1 and P2 lost. We discuss how to 385 reduce the false positives in the end of this section. 387 The examples above show that RACK is particularly useful when the 388 sender is limited by the application, which is common for 389 interactive, request/response traffic. Similarly, RACK still works 390 when the sender is limited by the receive window, which is common for 391 applications that use the receive window to throttle the sender. 393 For some implementations (e.g., Linux), RACK works quite efficiently 394 with TCP Segmentation Offload (TSO). RACK always marks the entire 395 TSO blob lost because the packets in the same TSO blob have the same 396 transmission timestamp. By contrast, the counting based algorithms 397 (e.g., [RFC3517][RFC5681]) may mark only a subset of packets in the 398 TSO blob lost, forcing the stack to perform expensive fragmentation 399 of the TSO blob, or to selectively tag individual packets lost in the 400 scoreboard. 402 6.2. Disadvantages 404 RACK requires the sender to record the transmission time of each 405 packet sent at a clock granularity of one millisecond or finer. TCP 406 implementations that record this already for RTT estimation do not 407 require any new per-packet state. But implementations that are not 408 yet recording packet transmission times will need to add per-packet 409 internal state (commonly either 4 or 8 octets per packet) to track 410 transmission times. In contrast, the conventional approach requires 411 one variable to track number of duplicate ACK threshold. 413 6.3. Adjusting the reordering window 415 RACK uses a reordering window of min_rtt / 4. It uses the minimum 416 RTT to accommodate reordering introduced by packets traversing 417 slightly different paths (e.g., router-based parallelism schemes) or 418 out-of-order deliveries in the lower link layer (e.g., wireless links 419 using link-layer retransmission). Alternatively, RACK can use the 420 smoothed RTT used in RTT estimation [RFC6298]. However, smoothed RTT 421 can be significantly inflated by orders of magnitude due to 422 congestion and buffer-bloat, which would result in an overly 423 conservative reordering window and slow loss detection. Furthermore, 424 RACK uses a quarter of minimum RTT because Linux TCP uses the same 425 factor in its implementation to delay Early Retransmit [RFC5827] to 426 reduce spurious loss detections in the presence of reordering, and 427 experience shows that this seems to work reasonably well. 429 One potential improvement is to further adapt the reordering window 430 by measuring the degree of reordering in time, instead of packet 431 distances. But that requires storing the delivery timestamp of each 432 packet. Some scoreboard implementations currently merge SACKed 433 packets together to support TSO (TCP Segmentation Offload) for faster 434 scoreboard indexing. Supporting per-packet delivery timestamps is 435 difficult in such implementations. However, we acknowledge that the 436 current metric can be improved by further research. 438 6.4. Relationships with other loss recovery algorithms 440 The primary motivation of RACK is to ultimately provide a simple and 441 general replacement for some of the standard loss recovery algorithms 442 [RFC5681][RFC6675][RFC5827][RFC4653] and nonstandard ones 443 [FACK][THIN-STREAM]. While RACK can be a supplemental loss detection 444 on top of these algorithms, this is not necessary, because the RACK 445 implicitly subsumes most of them. 447 [RFC5827][RFC4653][THIN-STREAM] dynamically adjusts the duplicate ACK 448 threshold based on the current or previous flight sizes. RACK takes 449 a different approach, by using only one ACK event and a reordering 450 window. RACK can be seen as an extended Early Retransmit [RFC5827] 451 without a FlightSize limit but with an additional reordering window. 452 [FACK] considers an original packet to be lost when its sequence 453 range is sufficiently far below the highest SACKed sequence. In some 454 sense RACK can be seen as a generalized form of FACK that operates in 455 time space instead of sequence space, enabling it to better handle 456 reordering, application-limited traffic, and lost retransmissions. 458 Nevertheless RACK is still an experimental algorithm. Since the 459 oldest loss detection algorithm, the 3 duplicate ACK threshold 460 [RFC5681], has been standardized and widely deployed, we RECOMMEND 461 TCP implementations use both RACK and the algorithm specified in 462 Section 3.2 in [RFC5681] for compatibility. 464 RACK is compatible with and does not interfere with the the standard 465 RTO [RFC6298], RTO-restart [RFC7765], F-RTO [RFC5682] and Eifel 466 algorithms [RFC3522]. This is because RACK only detects loss by 467 using ACK events. It neither changes the timer calculation nor 468 detects spurious timeouts. 470 Furthermore, RACK naturally works well with Tail Loss Probe [TLP] 471 because a tail loss probe solicit seither an ACK or SACK, which can 472 be used by RACK to detect more losses. RACK can be used to relax 473 TLP's requirement for using FACK and retransmitting the the highest- 474 sequenced packet, because RACK is agnostic to packet sequence 475 numbers, and uses transmission time instead. Thus TLP can be 476 modified to retransmit the first unacknowledged packet, which can 477 improve application latency. 479 6.5. Interaction with congestion control 481 RACK intentionally decouples loss detection from congestion control. 482 RACK only detects losses; it does not modify the congestion control 483 algorithm [RFC5681][RFC6937]. However, RACK may detect losses 484 earlier or later than the conventional duplicate ACK threshold 485 approach does. A packet marked lost by RACK SHOULD NOT be 486 retransmitted until congestion control deems this appropriate (e.g. 487 using [RFC6937]). 489 RACK is applicable for both fast recovery and recovery after a 490 retransmission timeout (RTO) in [RFC5681]. The distinction between 491 fast recovery or RTO recovery is not necessary because RACK is purely 492 based on the transmission time order of packets. When a packet 493 retransmitted by RTO is acknowledged, RACK will mark any unacked 494 packet sent sufficiently prior to the RTO as lost, because at least 495 one RTT has elapsed since these packets were sent. 497 6.6. RACK for other transport protocols 499 RACK can be implemented in other transport protocols. The algorithm 500 can skip step 3 and simplify if the protocol can support unique 501 transmission or packet identifier (e.g. TCP echo options). For 502 example, the QUIC protocol implements RACK [QUIC-LR] . 504 7. Security Considerations 506 RACK does not change the risk profile for TCP. 508 An interesting scenario is ACK-splitting attacks [SCWA99]: for an 509 MSS-size packet sent, the receiver or the attacker might send MSS 510 ACKs that SACK or acknowledge one additional byte per ACK. This 511 would not fool RACK. RACK.xmit_ts would not advance because all the 512 sequences of the packet are transmitted at the same time (carry the 513 same transmission timestamp). In other words, SACKing only one byte 514 of a packet or SACKing the packet in entirety have the same effect on 515 RACK. 517 8. IANA Considerations 519 This document makes no request of IANA. 521 Note to RFC Editor: this section may be removed on publication as an 522 RFC. 524 9. Acknowledgments 526 The authors thank Matt Mathis for his insights in FACK and Michael 527 Welzl for his per-packet timer idea that inspired this work. Nandita 528 Dukkipati, Eric Dumazet, Randy Stewart, Van Jacobson, Ian Swett, and 529 Jana Iyengar contributed to the algorithm and the implementations in 530 Linux, FreeBSD and QUIC. 532 10. References 534 10.1. Normative References 536 [RFC793] Postel, J., "Transmission Control Protocol", September 537 1981. 539 [RFC2018] Mathis, M. and J. Mahdavi, "TCP Selective Acknowledgment 540 Options", RFC 2018, October 1996. 542 [RFC6937] Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional 543 Rate Reduction for TCP", May 2013. 545 [RFC4737] Morton, A., Ciavattone, L., Ramachandran, G., Shalunov, 546 S., and J. Perser, "Packet Reordering Metrics", RFC 4737, 547 November 2006. 549 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 550 and Y. Nishida, "A Conservative Loss Recovery Algorithm 551 Based on Selective Acknowledgment (SACK) for TCP", 552 RFC 6675, August 2012. 554 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 555 "Computing TCP's Retransmission Timer", RFC 6298, June 556 2011. 558 [RFC5827] Allman, M., Ayesta, U., Wang, L., Blanton, J., and P. 559 Hurtig, "Early Retransmit for TCP and Stream Control 560 Transmission Protocol (SCTP)", RFC 5827, April 2010. 562 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 563 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 564 Spurious Retransmission Timeouts with TCP", RFC 5682, 565 September 2009. 567 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 568 Requirement Levels", RFC 2119, March 1997. 570 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 571 Control", RFC 5681, September 2009. 573 [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 574 Extension to the Selective Acknowledgement (SACK) Option 575 for TCP", RFC 2883, July 2000. 577 [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. 578 Scheffenegger, "TCP Extensions for High Performance", 579 September 2014. 581 10.2. Informative References 583 [FACK] Mathis, M. and M. Jamshid, "Forward acknowledgement: 584 refining TCP congestion control", ACM SIGCOMM Computer 585 Communication Review, Volume 26, Issue 4, Oct. 1996. , 586 1996. 588 [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 589 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 590 Tail Drops", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 591 in progress), August 2013. 593 [RFC7765] Hurtig, P., Brunstrom, A., Petlund, A., and M. Welzl, "TCP 594 and SCTP RTO Restart", February 2016. 596 [REORDER-DETECT] 597 Zimmermann, A., Schulte, L., Wolff, C., and A. Hannemann, 598 "Detection and Quantification of Packet Reordering with 599 TCP", draft-zimmermann-tcpm-reordering-detection-02 (work 600 in progress), November 2014. 602 [QUIC-LR] Iyengar, J. and I. Swett, "QUIC Loss Recovery And 603 Congestion Control", draft-tsvwg-quic-loss-recovery-01 604 (work in progress), June 2016. 606 [THIN-STREAM] 607 Petlund, A., Evensen, K., Griwodz, C., and P. Halvorsen, 608 "TCP enhancements for interactive thin-stream 609 applications", NOSSDAV , 2008. 611 [SCWA99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, 612 "TCP Congestion Control With a Misbehaving Receiver", ACM 613 Computer Communication Review, 29(5) , 1999. 615 [POLICER16] 616 Flach, T., Papageorge, P., Terzis, A., Pedrosa, L., Cheng, 617 Y., Karim, T., Katz-Bassett, E., and R. Govindan, "An 618 Analysis of Traffic Policing in the Web", ACM SIGCOMM , 619 2016. 621 Authors' Addresses 623 Yuchung Cheng 624 Google, Inc 625 1600 Amphitheater Parkway 626 Mountain View, California 94043 627 USA 629 Email: ycheng@google.com 631 Neal Cardwell 632 Google, Inc 633 76 Ninth Avenue 634 New York, NY 10011 635 USA 637 Email: ncardwell@google.com