idnits 2.17.1 draft-ietf-quic-recovery-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (August 15, 2017) is 2445 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC J. Iyengar, Ed. 3 Internet-Draft I. Swett, Ed. 4 Intended status: Standards Track Google 5 Expires: February 16, 2018 August 15, 2017 7 QUIC Loss Detection and Congestion Control 8 draft-ietf-quic-recovery-05 10 Abstract 12 This document describes loss detection and congestion control 13 mechanisms for QUIC. 15 Note to Readers 17 Discussion of this draft takes place on the QUIC working group 18 mailing list (quic@ietf.org), which is archived at 19 https://mailarchive.ietf.org/arch/search/?email_list=quic. 21 Working Group information can be found at https://github.com/quicwg; 22 source code and issues list for this draft can be found at 23 https://github.com/quicwg/base-drafts/labels/recovery. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on February 16, 2018. 42 Copyright Notice 44 Copyright (c) 2017 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 61 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 3 62 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 4 63 2.1.1. Monotonically Increasing Packet Numbers . . . . . . . 4 64 2.1.2. No Reneging . . . . . . . . . . . . . . . . . . . . . 4 65 2.1.3. More ACK Ranges . . . . . . . . . . . . . . . . . . . 5 66 2.1.4. Explicit Correction For Delayed Acks . . . . . . . . 5 67 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 5 68 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 5 69 3.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 6 70 3.2.1. Constants of interest . . . . . . . . . . . . . . . . 6 71 3.2.2. Variables of interest . . . . . . . . . . . . . . . . 6 72 3.2.3. Initialization . . . . . . . . . . . . . . . . . . . 8 73 3.2.4. On Sending a Packet . . . . . . . . . . . . . . . . . 8 74 3.2.5. On Ack Receipt . . . . . . . . . . . . . . . . . . . 9 75 3.2.6. On Packet Acknowledgment . . . . . . . . . . . . . . 9 76 3.2.7. Setting the Loss Detection Alarm . . . . . . . . . . 10 77 3.2.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 12 78 3.2.9. Detecting Lost Packets . . . . . . . . . . . . . . . 13 79 3.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . 14 80 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 14 81 4.1. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 15 82 4.2. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 15 83 4.3. Constants of interest . . . . . . . . . . . . . . . . . . 15 84 4.4. Variables of interest . . . . . . . . . . . . . . . . . . 15 85 4.5. Initialization . . . . . . . . . . . . . . . . . . . . . 16 86 4.6. On Packet Acknowledgement . . . . . . . . . . . . . . . . 16 87 4.7. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 16 88 4.8. On Retransmission Timeout Verified . . . . . . . . . . . 17 89 4.9. Pacing Packets . . . . . . . . . . . . . . . . . . . . . 17 90 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 91 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 92 6.1. Normative References . . . . . . . . . . . . . . . . . . 17 93 6.2. Informative References . . . . . . . . . . . . . . . . . 17 94 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 18 95 Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 18 96 B.1. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 18 97 B.2. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 18 98 B.3. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 18 99 B.4. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 19 100 B.5. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 19 101 B.6. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 19 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 104 1. Introduction 106 QUIC is a new multiplexed and secure transport atop UDP. QUIC builds 107 on decades of transport and security experience, and implements 108 mechanisms that make it attractive as a modern general-purpose 109 transport. The QUIC protocol is described in [QUIC-TRANSPORT]. 111 QUIC implements the spirit of known TCP loss recovery mechanisms, 112 described in RFCs, various Internet-drafts, and also those prevalent 113 in the Linux TCP implementation. This document describes QUIC 114 congestion control and loss recovery, and where applicable, 115 attributes the TCP equivalent in RFCs, Internet-drafts, academic 116 papers, and/or TCP implementations. 118 1.1. Notational Conventions 120 The words "MUST", "MUST NOT", "SHOULD", and "MAY" are used in this 121 document. It's not shouting; when they are capitalized, they have 122 the special meaning defined in [RFC2119]. 124 2. Design of the QUIC Transmission Machinery 126 All transmissions in QUIC are sent with a packet-level header, which 127 includes a packet sequence number (referred to below as a packet 128 number). These packet numbers never repeat in the lifetime of a 129 connection, and are monotonically increasing, which makes duplicate 130 detection trivial. This fundamental design decision obviates the 131 need for disambiguating between transmissions and retransmissions and 132 eliminates significant complexity from QUIC's interpretation of TCP 133 loss detection mechanisms. 135 Every packet may contain several frames. We outline the frames that 136 are important to the loss detection and congestion control machinery 137 below. 139 o Retransmittable frames are frames requiring reliable delivery. 140 The most common are STREAM frames, which typically contain 141 application data. 143 o Crypto handshake data is sent on stream 0, and uses the 144 reliability machinery of QUIC underneath. 146 o ACK frames contain acknowledgment information. QUIC uses a SACK- 147 based scheme, where acks express up to 256 ranges. The ACK frame 148 also includes a receive timestamp for each packet newly acked. 150 2.1. Relevant Differences Between QUIC and TCP 152 Readers familiar with TCP's loss detection and congestion control 153 will find algorithms here that parallel well-known TCP ones. 154 Protocol differences between QUIC and TCP however contribute to 155 algorithmic differences. We briefly describe these protocol 156 differences below. 158 2.1.1. Monotonically Increasing Packet Numbers 160 TCP conflates transmission sequence number at the sender with 161 delivery sequence number at the receiver, which results in 162 retransmissions of the same data carrying the same sequence number, 163 and consequently to problems caused by "retransmission ambiguity". 164 QUIC separates the two: QUIC uses a packet sequence number (referred 165 to as the "packet number") for transmissions, and any data that is to 166 be delivered to the receiving application(s) is sent in one or more 167 streams, with stream offsets encoded within STREAM frames inside of 168 packets that determine delivery order. 170 QUIC's packet number is strictly increasing, and directly encodes 171 transmission order. A higher QUIC packet number signifies that the 172 packet was sent later, and a lower QUIC packet number signifies that 173 the packet was sent earlier. When a packet containing frames is 174 deemed lost, QUIC rebundles necessary frames in a new packet with a 175 new packet number, removing ambiguity about which packet is 176 acknowledged when an ACK is received. Consequently, more accurate 177 RTT measurements can be made, spurious retransmissions are trivially 178 detected, and mechanisms such as Fast Retransmit can be applied 179 universally, based only on packet number. 181 This design point significantly simplifies loss detection mechanisms 182 for QUIC. Most TCP mechanisms implicitly attempt to infer 183 transmission ordering based on TCP sequence numbers - a non-trivial 184 task, especially when TCP timestamps are not available. 186 2.1.2. No Reneging 188 QUIC ACKs contain information that is equivalent to TCP SACK, but 189 QUIC does not allow any acked packet to be reneged, greatly 190 simplifying implementations on both sides and reducing memory 191 pressure on the sender. 193 2.1.3. More ACK Ranges 195 QUIC supports up to 256 ACK ranges, opposed to TCP's 3 SACK ranges. 196 In high loss environments, this speeds recovery. 198 2.1.4. Explicit Correction For Delayed Acks 200 QUIC ACKs explicitly encode the delay incurred at the receiver 201 between when a packet is received and when the corresponding ACK is 202 sent. This allows the receiver of the ACK to adjust for receiver 203 delays, specifically the delayed ack timer, when estimating the path 204 RTT. This mechanism also allows a receiver to measure and report the 205 delay from when a packet was received by the OS kernel, which is 206 useful in receivers which may incur delays such as context-switch 207 latency before a userspace QUIC receiver processes a received packet. 209 3. Loss Detection 211 3.1. Overview 213 QUIC uses a combination of ack information and alarms to detect lost 214 packets. An unacknowledged QUIC packet is marked as lost in one of 215 the following ways: 217 o A packet is marked as lost if at least one packet that was sent a 218 threshold number of packets (kReorderingThreshold) after it has 219 been acknowledged. This indicates that the unacknowledged packet 220 is either lost or reordered beyond the specified threshold. This 221 mechanism combines both TCP's FastRetransmit and FACK mechanisms. 223 o If a packet is near the tail, where fewer than 224 kReorderingThreshold packets are sent after it, the sender cannot 225 expect to detect loss based on the previous mechanism. In this 226 case, a sender uses both ack information and an alarm to detect 227 loss. Specifically, when the last sent packet is acknowledged, 228 the sender waits a short period of time to allow for reordering 229 and then marks any unacknowledged packets as lost. This mechanism 230 is based on the Linux implementation of TCP Early Retransmit. 232 o If a packet is sent at the tail, there are no packets sent after 233 it, and the sender cannot use ack information to detect its loss. 234 The sender therefore relies on an alarm to detect such tail 235 losses. This mechanism is based on TCP's Tail Loss Probe. 237 o If all else fails, a Retransmission Timeout (RTO) alarm is always 238 set when any retransmittable packet is outstanding. When this 239 alarm fires, all unacknowledged packets are marked as lost. 241 o Instead of a packet threshold to tolerate reordering, a QUIC 242 sender may use a time threshold. This allows for senders to be 243 tolerant of short periods of significant reordering. In this 244 mechanism, a QUIC sender marks a packet as lost when a packet 245 larger than it is acknowledged and a threshold amount of time has 246 passed since the packet was sent. 248 o Handshake packets, which contain STREAM frames for stream 0, are 249 critical to QUIC transport and crypto negotiation, so a separate 250 alarm period is used for them. 252 3.2. Algorithm Details 254 3.2.1. Constants of interest 256 Constants used in loss recovery are based on a combination of RFCs, 257 papers, and common practice. Some may need to be changed or 258 negotiated in order to better suit a variety of environments. 260 kMaxTLPs (default 2): Maximum number of tail loss probes before an 261 RTO fires. 263 kReorderingThreshold (default 3): Maximum reordering in packet 264 number space before FACK style loss detection considers a packet 265 lost. 267 kTimeReorderingFraction (default 1/8): Maximum reordering in time 268 space before time based loss detection considers a packet lost. 269 In fraction of an RTT. 271 kMinTLPTimeout (default 10ms): Minimum time in the future a tail 272 loss probe alarm may be set for. 274 kMinRTOTimeout (default 200ms): Minimum time in the future an RTO 275 alarm may be set for. 277 kDelayedAckTimeout (default 25ms): The length of the peer's delayed 278 ack timer. 280 kDefaultInitialRtt (default 100ms): The default RTT used before an 281 RTT sample is taken. 283 3.2.2. Variables of interest 285 Variables required to implement the congestion control mechanisms are 286 described in this section. 288 loss_detection_alarm: Multi-modal alarm used for loss detection. 290 handshake_count: The number of times the handshake packets have been 291 retransmitted without receiving an ack. 293 tlp_count: The number of times a tail loss probe has been sent 294 without receiving an ack. 296 rto_count: The number of times an rto has been sent without 297 receiving an ack. 299 largest_sent_before_rto: The last packet number sent prior to the 300 first retransmission timeout. 302 time_of_last_sent_packet: The time the most recent packet was sent. 304 largest_sent_packet: The packet number of the most recently sent 305 packet. 307 largest_acked_packet: The largest packet number acknowledged in an 308 ack frame. 310 latest_rtt: The most recent RTT measurement made when receiving an 311 ack for a previously unacked packet. 313 smoothed_rtt: The smoothed RTT of the connection, computed as 314 described in [RFC6298] 316 rttvar: The RTT variance, computed as described in [RFC6298] 318 reordering_threshold: The largest delta between the largest acked 319 retransmittable packet and a packet containing retransmittable 320 frames before it's declared lost. 322 time_reordering_fraction: The reordering window as a fraction of 323 max(smoothed_rtt, latest_rtt). 325 loss_time: The time at which the next packet will be considered lost 326 based on early transmit or exceeding the reordering window in 327 time. 329 sent_packets: An association of packet numbers to information about 330 them, including a number field indicating the packet number, a 331 time field indicating the time a packet was sent, and a bytes 332 field indicating the packet's size. sent_packets is ordered by 333 packet number, and packets remain in sent_packets until 334 acknowledged or lost. 336 3.2.3. Initialization 338 At the beginning of the connection, initialize the loss detection 339 variables as follows: 341 loss_detection_alarm.reset() 342 handshake_count = 0 343 tlp_count = 0 344 rto_count = 0 345 if (UsingTimeLossDetection()) 346 reordering_threshold = infinite 347 time_reordering_fraction = kTimeReorderingFraction 348 else: 349 reordering_threshold = kReorderingThreshold 350 time_reordering_fraction = infinite 351 loss_time = 0 352 smoothed_rtt = 0 353 rttvar = 0 354 largest_sent_before_rto = 0 355 time_of_last_sent_packet = 0 356 largest_sent_packet = 0 358 3.2.4. On Sending a Packet 360 After any packet is sent, be it a new transmission or a rebundled 361 transmission, the following OnPacketSent function is called. The 362 parameters to OnPacketSent are as follows: 364 o packet_number: The packet number of the sent packet. 366 o is_retransmittable: A boolean that indicates whether the packet 367 contains at least one frame requiring reliable deliver. The 368 retransmittability of various QUIC frames is described in 369 [QUIC-TRANSPORT]. If false, it is still acceptable for an ack to 370 be received for this packet. However, a caller MUST NOT set 371 is_retransmittable to true if an ack is not expected. 373 o sent_bytes: The number of bytes sent in the packet. 375 Pseudocode for OnPacketSent follows: 377 OnPacketSent(packet_number, is_retransmittable, sent_bytes): 378 time_of_last_sent_packet = now 379 largest_sent_packet = packet_number 380 sent_packets[packet_number].packet_number = packet_number 381 sent_packets[packet_number].time = now 382 if is_retransmittable: 383 sent_packets[packet_number].bytes = sent_bytes 384 SetLossDetectionAlarm() 386 3.2.5. On Ack Receipt 388 When an ack is received, it may acknowledge 0 or more packets. 390 Pseudocode for OnAckReceived and UpdateRtt follow: 392 OnAckReceived(ack): 393 largest_acked_packet = ack.largest_acked 394 // If the largest acked is newly acked, update the RTT. 395 if (sent_packets[ack.largest_acked]): 396 latest_rtt = now - sent_packets[ack.largest_acked].time 397 if (latest_rtt > ack.ack_delay): 398 latest_rtt -= ack.delay 399 UpdateRtt(latest_rtt) 400 // Find all newly acked packets. 401 for acked_packet in DetermineNewlyAckedPackets(): 402 OnPacketAcked(acked_packet.packet_number) 404 DetectLostPackets(ack.largest_acked_packet) 405 SetLossDetectionAlarm() 407 UpdateRtt(latest_rtt): 408 // Based on {{RFC6298}}. 409 if (smoothed_rtt == 0): 410 smoothed_rtt = latest_rtt 411 rttvar = latest_rtt / 2 412 else: 413 rttvar = 3/4 * rttvar + 1/4 * (smoothed_rtt - latest_rtt) 414 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt 416 3.2.6. On Packet Acknowledgment 418 When a packet is acked for the first time, the following 419 OnPacketAcked function is called. Note that a single ACK frame may 420 newly acknowledge several packets. OnPacketAcked must be called once 421 for each of these newly acked packets. 423 OnPacketAcked takes one parameter, acked_packet, which is the packet 424 number of the newly acked packet, and returns a list of packet 425 numbers that are detected as lost. 427 If this is the first acknowledgement following RTO, check if the 428 smallest newly acknowledged packet is one sent by the RTO, and if so, 429 inform congestion control of a verified RTO, similar to F-RTO 430 [RFC5682] 432 Pseudocode for OnPacketAcked follows: 434 OnPacketAcked(acked_packet_number): 435 OnPacketAckedCC(acked_packet_number) 436 // If a packet sent prior to RTO was acked, then the RTO 437 // was spurious. Otherwise, inform congestion control. 438 if (rto_count > 0 && 439 acked_packet_number > largest_sent_before_rto) 440 OnRetransmissionTimeoutVerified() 441 handshake_count = 0 442 tlp_count = 0 443 rto_count = 0 444 sent_packets.remove(acked_packet_number) 446 3.2.7. Setting the Loss Detection Alarm 448 QUIC loss detection uses a single alarm for all timer-based loss 449 detection. The duration of the alarm is based on the alarm's mode, 450 which is set in the packet and timer events further below. The 451 function SetLossDetectionAlarm defined below shows how the single 452 timer is set based on the alarm mode. 454 3.2.7.1. Handshake Packets 456 The initial flight has no prior RTT sample. A client SHOULD remember 457 the previous RTT it observed when resumption is attempted and use 458 that for an initial RTT value. If no previous RTT is available, the 459 initial RTT defaults to 100ms. 461 Endpoints MUST retransmit handshake frames if not acknowledged within 462 a time limit. This time limit will start as the largest of twice the 463 RTT value and MinTLPTimeout. Each consecutive handshake 464 retransmission doubles the time limit, until an acknowledgement is 465 received. 467 Handshake frames may be cancelled by handshake state transitions. In 468 particular, all non-protected frames SHOULD be no longer be 469 transmitted once packet protection is available. 471 When stateless rejects are in use, the connection is considered 472 immediately closed once a reject is sent, so no timer is set to 473 retransmit the reject. 475 Version negotiation packets are always stateless, and MUST be sent 476 once per handshake packet that uses an unsupported QUIC version, and 477 MAY be sent in response to 0RTT packets. 479 3.2.7.2. Tail Loss Probe and Retransmission Timeout 481 Tail loss probes [LOSS-PROBE] and retransmission timeouts [RFC6298] 482 are an alarm based mechanism to recover from cases when there are 483 outstanding retransmittable packets, but an acknowledgement has not 484 been received in a timely manner. 486 3.2.7.3. Early Retransmit 488 Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It 489 is part of QUIC's time based loss detection, but is always enabled, 490 even when only packet reordering loss detection is enabled. 492 3.2.7.4. Pseudocode 494 Pseudocode for SetLossDetectionAlarm follows: 496 SetLossDetectionAlarm(): 497 if (retransmittable packets are not outstanding): 498 loss_detection_alarm.cancel() 499 return 501 if (handshake packets are outstanding): 502 // Handshake retransmission alarm. 503 if (smoothed_rtt == 0): 504 alarm_duration = 2 * kDefaultInitialRtt 505 else: 506 alarm_duration = 2 * smoothed_rtt 507 alarm_duration = max(alarm_duration, kMinTLPTimeout) 508 alarm_duration = alarm_duration * (2 ^ handshake_count) 509 else if (loss_time != 0): 510 // Early retransmit timer or time loss detection. 511 alarm_duration = loss_time - now 512 else if (tlp_count < kMaxTLPs): 513 // Tail Loss Probe 514 if (retransmittable_packets_outstanding = 1): 515 alarm_duration = 1.5 * smoothed_rtt + kDelayedAckTimeout 516 else: 517 alarm_duration = kMinTLPTimeout 518 alarm_duration = max(alarm_duration, 2 * smoothed_rtt) 519 else: 520 // RTO alarm 521 alarm_duration = smoothed_rtt + 4 * rttvar 522 alarm_duration = max(alarm_duration, kMinRTOTimeout) 523 alarm_duration = alarm_duration * (2 ^ rto_count) 525 loss_detection_alarm.set(now + alarm_duration) 527 3.2.8. On Alarm Firing 529 QUIC uses one loss recovery alarm, which when set, can be in one of 530 several modes. When the alarm fires, the mode determines the action 531 to be performed. 533 Pseudocode for OnLossDetectionAlarm follows: 535 OnLossDetectionAlarm(): 536 if (handshake packets are outstanding): 537 // Handshake retransmission alarm. 538 RetransmitAllHandshakePackets() 539 handshake_count++ 540 else if (loss_time != 0): 541 // Early retransmit or Time Loss Detection 542 DetectLostPackets(largest_acked_packet) 543 else if (tlp_count < kMaxTLPs): 544 // Tail Loss Probe. 545 SendOnePacket() 546 tlp_count++ 547 else: 548 // RTO. 549 if (rto_count == 0) 550 largest_sent_before_rto = largest_sent_packet 551 SendTwoPackets() 552 rto_count++ 554 SetLossDetectionAlarm() 556 3.2.9. Detecting Lost Packets 558 Packets in QUIC are only considered lost once a larger packet number 559 is acknowledged. DetectLostPackets is called every time an ack is 560 received. If the loss detection alarm fires and the loss_time is 561 set, the previous largest acked packet is supplied. 563 3.2.9.1. Handshake Packets 565 The receiver MUST ignore unprotected packets that ack protected 566 packets. The receiver MUST trust protected acks for unprotected 567 packets, however. Aside from this, loss detection for handshake 568 packets when an ack is processed is identical to other packets. 570 3.2.9.2. Pseudocode 572 DetectLostPackets takes one parameter, acked, which is the largest 573 acked packet. 575 Pseudocode for DetectLostPackets follows: 577 DetectLostPackets(largest_acked): 578 loss_time = 0 579 lost_packets = {} 580 delay_until_lost = infinite 581 if (time_reordering_fraction != infinite): 582 delay_until_lost = 583 (1 + time_reordering_fraction) * max(latest_rtt, smoothed_rtt) 584 else if (largest_acked.packet_number == largest_sent_packet): 585 // Early retransmit alarm. 586 delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) 587 foreach (unacked < largest_acked.packet_number): 588 time_since_sent = now() - unacked.time_sent 589 packet_delta = largest_acked.packet_number - unacked.packet_number 590 if (time_since_sent > delay_until_lost): 591 lost_packets.insert(unacked) 592 else if (packet_delta > reordering_threshold) 593 lost_packets.insert(unacked) 594 else if (loss_time == 0 && delay_until_lost != infinite): 595 loss_time = now() + delay_until_lost - time_since_sent 597 // Inform the congestion controller of lost packets and 598 // lets it decide whether to retransmit immediately. 599 if (!lost_packets.empty()) 600 OnPacketsLost(lost_packets) 601 foreach (packet in lost_packets) 602 sent_packets.remove(packet.packet_number) 604 3.3. Discussion 606 The majority of constants were derived from best common practices 607 among widely deployed TCP implementations on the internet. 608 Exceptions follow. 610 A shorter delayed ack time of 25ms was chosen because longer delayed 611 acks can delay loss recovery and for the small number of connections 612 where less than packet per 25ms is delivered, acking every packet is 613 beneficial to congestion control and loss recovery. 615 The default initial RTT of 100ms was chosen because it is slightly 616 higher than both the median and mean min_rtt typically observed on 617 the public internet. 619 4. Congestion Control 621 QUIC's congestion control is based on TCP NewReno[RFC6582] congestion 622 control to determine the congestion window and pacing rate. 624 4.1. Slow Start 626 QUIC begins every connection in slow start and exits slow start upon 627 loss. While in slow start, QUIC increases the congestion window by 628 the number of acknowledged bytes when each ack is processed. 630 4.2. Recovery 632 Recovery is a period of time beginning with detection of a lost 633 packet. It ends when all packets outstanding at the time recovery 634 began have been acknowledged or lost. During recovery, the 635 congestion window is not increased or decreased. 637 4.3. Constants of interest 639 Constants used in congestion control are based on a combination of 640 RFCs, papers, and common practice. Some may need to be changed or 641 negotiated in order to better suit a variety of environments. 643 kDefaultMss (default 1460 bytes): The default max packet size used 644 for calculating default and minimum congestion windows. 646 kInitialWindow (default 10 * kDefaultMss): Default limit on the 647 amount of outstanding data in bytes. 649 kMinimumWindow (default 2 * kDefaultMss): Default minimum congestion 650 window. 652 kLossReductionFactor (default 0.5): Reduction in congestion window 653 when a new loss event is detected. 655 4.4. Variables of interest 657 Variables required to implement the congestion control mechanisms are 658 described in this section. 660 bytes_in_flight: The sum of the size in bytes of all sent packets 661 that contain at least one retransmittable or PADDING frame, and 662 have not been acked or declared lost. The size does not include 663 IP or UDP overhead. Ack only frames do not count towards 664 byte_in_flight. 666 congestion_window: Maximum number of bytes in flight that may be 667 sent. 669 end_of_recovery: The packet number after which QUIC will no longer 670 be in recovery. 672 ssthresh Slow start threshold in bytes. When the congestion window 673 is below ssthresh, it grows by the number of bytes acknowledged 674 for each ack. 676 4.5. Initialization 678 At the beginning of the connection, initialize the loss detection 679 variables as follows: 681 congestion_window = kInitialWindow 682 bytes_in_flight = 0 683 end_of_recovery = 0 684 ssthresh = infinite 686 4.6. On Packet Acknowledgement 688 Invoked from loss detection's OnPacketAcked and is supplied with 689 acked_packet from sent_packets. 691 Pseudocode for OnPacketAckedCC follows: 693 OnPacketAckedCC(acked_packet): 694 if (acked_packet.packet_number < end_of_recovery): 695 return 696 if (congestion_window < ssthresh): 697 congestion_window += acket_packets.bytes 698 else: 699 congestion_window += 700 acked_packets.bytes / congestion_window 702 4.7. On Packets Lost 704 Invoked by loss detection from DetectLostPackets when new packets are 705 detected lost. 707 OnPacketsLost(lost_packets): 708 largest_lost_packet = lost_packets.last() 709 // Start a new recovery epoch if the lost packet is larger 710 // than the end of the previous recovery epoch. 711 if (end_of_recovery < largest_lost_packet.packet_number): 712 end_of_recovery = largest_sent_packet 713 congestion_window *= kLossReductionFactor 714 congestion_window = max(congestion_window, kMinimumWindow) 715 ssthresh = congestion_window 717 4.8. On Retransmission Timeout Verified 719 QUIC decreases the congestion window to the minimum value once the 720 retransmission timeout has been confirmed to not be spurious when the 721 first post-RTO acknowledgement is processed. 723 OnRetransmissionTimeoutVerified() 724 congestion_window = kMinimumWindow 726 4.9. Pacing Packets 728 QUIC sends a packet if there is available congestion window and 729 sending the packet does not exceed the pacing rate. 731 TimeToSend returns infinite if the congestion controller is 732 congestion window limited, a time in the past if the packet can be 733 sent immediately, and a time in the future if sending is pacing 734 limited. 736 TimeToSend(packet_size): 737 if (bytes_in_flight + packet_size > congestion_window) 738 return infinite 739 return time_of_last_sent_packet + 740 (packet_size * smoothed_rtt) / congestion_window 742 5. IANA Considerations 744 This document has no IANA actions. Yet. 746 6. References 748 6.1. Normative References 750 [QUIC-TRANSPORT] 751 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 752 Multiplexed and Secure Transport", draft-ietf-quic- 753 transport (work in progress), August 2017. 755 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 756 Requirement Levels", BCP 14, RFC 2119, 757 DOI 10.17487/RFC2119, March 1997, 758 . 760 6.2. Informative References 762 [LOSS-PROBE] 763 Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 764 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 765 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 766 in progress), February 2013. 768 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 769 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 770 Spurious Retransmission Timeouts with TCP", RFC 5682, 771 DOI 10.17487/RFC5682, September 2009, 772 . 774 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 775 P. Hurtig, "Early Retransmit for TCP and Stream Control 776 Transmission Protocol (SCTP)", RFC 5827, 777 DOI 10.17487/RFC5827, May 2010, 778 . 780 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 781 "Computing TCP's Retransmission Timer", RFC 6298, 782 DOI 10.17487/RFC6298, June 2011, 783 . 785 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 786 NewReno Modification to TCP's Fast Recovery Algorithm", 787 RFC 6582, DOI 10.17487/RFC6582, April 2012, 788 . 790 Appendix A. Acknowledgments 792 Appendix B. Change Log 794 *RFC Editor's Note:* Please remove this section prior to 795 publication of a final version of this document. 797 B.1. Since draft-ietf-quic-recovery-04 799 No significant changes. 801 B.2. Since draft-ietf-quic-recovery-03 803 No significant changes. 805 B.3. Since draft-ietf-quic-recovery-02 807 o Integrate F-RTO (#544, #409) 809 o Add congestion control (#545, #395) 810 o Require connection abort if a skipped packet was acknowledged 811 (#415) 813 o Simplify RTO calculations (#142, #417) 815 B.4. Since draft-ietf-quic-recovery-01 817 o Overview added to loss detection 819 o Changes initial default RTT to 100ms 821 o Added time-based loss detection and fixes early retransmit 823 o Clarified loss recovery for handshake packets 825 o Fixed references and made TCP references informative 827 B.5. Since draft-ietf-quic-recovery-00 829 o Improved description of constants and ACK behavior 831 B.6. Since draft-iyengar-quic-loss-recovery-01 833 o Adopted as base for draft-ietf-quic-recovery 835 o Updated authors/editors list 837 o Added table of contents 839 Authors' Addresses 841 Jana Iyengar (editor) 842 Google 844 Email: jri@google.com 846 Ian Swett (editor) 847 Google 849 Email: ianswett@google.com