idnits 2.17.1 draft-ietf-quic-recovery-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (June 13, 2017) is 2509 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC J. Iyengar, Ed. 3 Internet-Draft I. Swett, Ed. 4 Intended status: Standards Track Google 5 Expires: December 15, 2017 June 13, 2017 7 QUIC Loss Detection and Congestion Control 8 draft-ietf-quic-recovery-04 10 Abstract 12 This document describes loss detection and congestion control 13 mechanisms for QUIC. 15 Note to Readers 17 Discussion of this draft takes place on the QUIC working group 18 mailing list (quic@ietf.org), which is archived at 19 https://mailarchive.ietf.org/arch/search/?email_list=quic. 21 Working Group information can be found at https://github.com/quicwg; 22 source code and issues list for this draft can be found at 23 https://github.com/quicwg/base-drafts/labels/recovery. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on December 15, 2017. 42 Copyright Notice 44 Copyright (c) 2017 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 61 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 3 62 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 4 63 2.1.1. Monotonically Increasing Packet Numbers . . . . . . . 4 64 2.1.2. No Reneging . . . . . . . . . . . . . . . . . . . . . 4 65 2.1.3. More ACK Ranges . . . . . . . . . . . . . . . . . . . 5 66 2.1.4. Explicit Correction For Delayed Acks . . . . . . . . 5 67 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 5 68 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 5 69 3.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 6 70 3.2.1. Constants of interest . . . . . . . . . . . . . . . . 6 71 3.2.2. Variables of interest . . . . . . . . . . . . . . . . 6 72 3.2.3. Initialization . . . . . . . . . . . . . . . . . . . 7 73 3.2.4. On Sending a Packet . . . . . . . . . . . . . . . . . 8 74 3.2.5. On Ack Receipt . . . . . . . . . . . . . . . . . . . 8 75 3.2.6. On Packet Acknowledgment . . . . . . . . . . . . . . 9 76 3.2.7. Setting the Loss Detection Alarm . . . . . . . . . . 10 77 3.2.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 12 78 3.2.9. Detecting Lost Packets . . . . . . . . . . . . . . . 12 79 3.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . 13 80 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 14 81 4.1. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 14 82 4.2. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 14 83 4.3. Constants of interest . . . . . . . . . . . . . . . . . . 14 84 4.4. Variables of interest . . . . . . . . . . . . . . . . . . 14 85 4.5. Initialization . . . . . . . . . . . . . . . . . . . . . 15 86 4.6. On Packet Acknowledgement . . . . . . . . . . . . . . . . 15 87 4.7. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 15 88 4.8. On Retransmission Timeout Verified . . . . . . . . . . . 16 89 4.9. Pacing Packets . . . . . . . . . . . . . . . . . . . . . 16 90 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 91 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 92 6.1. Normative References . . . . . . . . . . . . . . . . . . 16 93 6.2. Informative References . . . . . . . . . . . . . . . . . 16 94 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 17 95 Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 17 96 B.1. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 17 97 B.2. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 18 98 B.3. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 18 99 B.4. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 18 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 102 1. Introduction 104 QUIC is a new multiplexed and secure transport atop UDP. QUIC builds 105 on decades of transport and security experience, and implements 106 mechanisms that make it attractive as a modern general-purpose 107 transport. The QUIC protocol is described in [QUIC-TRANSPORT]. 109 QUIC implements the spirit of known TCP loss recovery mechanisms, 110 described in RFCs, various Internet-drafts, and also those prevalent 111 in the Linux TCP implementation. This document describes QUIC 112 congestion control and loss recovery, and where applicable, 113 attributes the TCP equivalent in RFCs, Internet-drafts, academic 114 papers, and/or TCP implementations. 116 1.1. Notational Conventions 118 The words "MUST", "MUST NOT", "SHOULD", and "MAY" are used in this 119 document. It's not shouting; when they are capitalized, they have 120 the special meaning defined in [RFC2119]. 122 2. Design of the QUIC Transmission Machinery 124 All transmissions in QUIC are sent with a packet-level header, which 125 includes a packet sequence number (referred to below as a packet 126 number). These packet numbers never repeat in the lifetime of a 127 connection, and are monotonically increasing, which makes duplicate 128 detection trivial. This fundamental design decision obviates the 129 need for disambiguating between transmissions and retransmissions and 130 eliminates significant complexity from QUIC's interpretation of TCP 131 loss detection mechanisms. 133 Every packet may contain several frames. We outline the frames that 134 are important to the loss detection and congestion control machinery 135 below. 137 o Retransmittable frames are frames requiring reliable delivery. 138 The most common are STREAM frames, which typically contain 139 application data. 141 o Crypto handshake data is sent on stream 0, and uses the 142 reliability machinery of QUIC underneath. 144 o ACK frames contain acknowledgment information. QUIC uses a SACK- 145 based scheme, where acks express up to 256 ranges. The ACK frame 146 also includes a receive timestamp for each packet newly acked. 148 2.1. Relevant Differences Between QUIC and TCP 150 Readers familiar with TCP's loss detection and congestion control 151 will find algorithms here that parallel well-known TCP ones. 152 Protocol differences between QUIC and TCP however contribute to 153 algorithmic differences. We briefly describe these protocol 154 differences below. 156 2.1.1. Monotonically Increasing Packet Numbers 158 TCP conflates transmission sequence number at the sender with 159 delivery sequence number at the receiver, which results in 160 retransmissions of the same data carrying the same sequence number, 161 and consequently to problems caused by "retransmission ambiguity". 162 QUIC separates the two: QUIC uses a packet sequence number (referred 163 to as the "packet number") for transmissions, and any data that is to 164 be delivered to the receiving application(s) is sent in one or more 165 streams, with stream offsets encoded within STREAM frames inside of 166 packets that determine delivery order. 168 QUIC's packet number is strictly increasing, and directly encodes 169 transmission order. A higher QUIC packet number signifies that the 170 packet was sent later, and a lower QUIC packet number signifies that 171 the packet was sent earlier. When a packet containing frames is 172 deemed lost, QUIC rebundles necessary frames in a new packet with a 173 new packet number, removing ambiguity about which packet is 174 acknowledged when an ACK is received. Consequently, more accurate 175 RTT measurements can be made, spurious retransmissions are trivially 176 detected, and mechanisms such as Fast Retransmit can be applied 177 universally, based only on packet number. 179 This design point significantly simplifies loss detection mechanisms 180 for QUIC. Most TCP mechanisms implicitly attempt to infer 181 transmission ordering based on TCP sequence numbers - a non-trivial 182 task, especially when TCP timestamps are not available. 184 2.1.2. No Reneging 186 QUIC ACKs contain information that is equivalent to TCP SACK, but 187 QUIC does not allow any acked packet to be reneged, greatly 188 simplifying implementations on both sides and reducing memory 189 pressure on the sender. 191 2.1.3. More ACK Ranges 193 QUIC supports up to 256 ACK ranges, opposed to TCP's 3 SACK ranges. 194 In high loss environments, this speeds recovery. 196 2.1.4. Explicit Correction For Delayed Acks 198 QUIC ACKs explicitly encode the delay incurred at the receiver 199 between when a packet is received and when the corresponding ACK is 200 sent. This allows the receiver of the ACK to adjust for receiver 201 delays, specifically the delayed ack timer, when estimating the path 202 RTT. This mechanism also allows a receiver to measure and report the 203 delay from when a packet was received by the OS kernel, which is 204 useful in receivers which may incur delays such as context-switch 205 latency before a userspace QUIC receiver processes a received packet. 207 3. Loss Detection 209 3.1. Overview 211 QUIC uses a combination of ack information and alarms to detect lost 212 packets. An unacknowledged QUIC packet is marked as lost in one of 213 the following ways: 215 o A packet is marked as lost if at least one packet that was sent a 216 threshold number of packets (kReorderingThreshold) after it has 217 been acknowledged. This indicates that the unacknowledged packet 218 is either lost or reordered beyond the specified threshold. This 219 mechanism combines both TCP's FastRetransmit and FACK mechanisms. 221 o If a packet is near the tail, where fewer than 222 kReorderingThreshold packets are sent after it, the sender cannot 223 expect to detect loss based on the previous mechanism. In this 224 case, a sender uses both ack information and an alarm to detect 225 loss. Specifically, when the last sent packet is acknowledged, 226 the sender waits a short period of time to allow for reordering 227 and then marks any unacknowledged packets as lost. This mechanism 228 is based on the Linux implementation of TCP Early Retransmit. 230 o If a packet is sent at the tail, there are no packets sent after 231 it, and the sender cannot use ack information to detect its loss. 232 The sender therefore relies on an alarm to detect such tail 233 losses. This mechanism is based on TCP's Tail Loss Probe. 235 o If all else fails, a Retransmission Timeout (RTO) alarm is always 236 set when any retransmittable packet is outstanding. When this 237 alarm fires, all unacknowledged packets are marked as lost. 239 o Instead of a packet threshold to tolerate reordering, a QUIC 240 sender may use a time threshold. This allows for senders to be 241 tolerant of short periods of significant reordering. In this 242 mechanism, a QUIC sender marks a packet as lost when a packet 243 larger than it is acknowledged and a threshold amount of time has 244 passed since the packet was sent. 246 o Handshake packets, which contain STREAM frames for stream 0, are 247 critical to QUIC transport and crypto negotiation, so a separate 248 alarm period is used for them. 250 3.2. Algorithm Details 252 3.2.1. Constants of interest 254 Constants used in loss recovery are based on a combination of RFCs, 255 papers, and common practice. Some may need to be changed or 256 negotiated in order to better suit a variety of environments. 258 kMaxTLPs (default 2): Maximum number of tail loss probes before an 259 RTO fires. 261 kReorderingThreshold (default 3): Maximum reordering in packet 262 number space before FACK style loss detection considers a packet 263 lost. 265 kTimeReorderingFraction (default 1/8): Maximum reordering in time 266 space before time based loss detection considers a packet lost. 267 In fraction of an RTT. 269 kMinTLPTimeout (default 10ms): Minimum time in the future a tail 270 loss probe alarm may be set for. 272 kMinRTOTimeout (default 200ms): Minimum time in the future an RTO 273 alarm may be set for. 275 kDelayedAckTimeout (default 25ms): The length of the peer's delayed 276 ack timer. 278 kDefaultInitialRtt (default 100ms): The default RTT used before an 279 RTT sample is taken. 281 3.2.2. Variables of interest 283 Variables required to implement the congestion control mechanisms are 284 described in this section. 286 loss_detection_alarm: Multi-modal alarm used for loss detection. 288 handshake_count: The number of times the handshake packets have been 289 retransmitted without receiving an ack. 291 tlp_count: The number of times a tail loss probe has been sent 292 without receiving an ack. 294 rto_count: The number of times an rto has been sent without 295 receiving an ack. 297 largest_sent_before_rto: The last packet number sent prior to the 298 first retransmission timeout. 300 time_of_last_sent_packet: The time the most recent packet was sent. 302 latest_rtt: The most recent RTT measurement made when receiving an 303 ack for a previously unacked packet. 305 smoothed_rtt: The smoothed RTT of the connection, computed as 306 described in [RFC6298] 308 rttvar: The RTT variance, computed as described in [RFC6298] 310 reordering_threshold: The largest delta between the largest acked 311 retransmittable packet and a packet containing retransmittable 312 frames before it's declared lost. 314 time_reordering_fraction: The reordering window as a fraction of 315 max(smoothed_rtt, latest_rtt). 317 loss_time: The time at which the next packet will be considered lost 318 based on early transmit or exceeding the reordering window in 319 time. 321 sent_packets: An association of packet numbers to information about 322 them, including a number field indicating the packet number, a 323 time field indicating the time a packet was sent, and a bytes 324 field indicating the packet's size. sent_packets is ordered by 325 packet number, and packets remain in sent_packets until 326 acknowledged or lost. 328 3.2.3. Initialization 330 At the beginning of the connection, initialize the loss detection 331 variables as follows: 333 loss_detection_alarm.reset() 334 handshake_count = 0 335 tlp_count = 0 336 rto_count = 0 337 if (UsingTimeLossDetection()) 338 reordering_threshold = infinite 339 time_reordering_fraction = kTimeReorderingFraction 340 else: 341 reordering_threshold = kReorderingThreshold 342 time_reordering_fraction = infinite 343 loss_time = 0 344 smoothed_rtt = 0 345 rttvar = 0 346 largest_sent_before_rto = 0 347 time_of_last_sent_packet = 0 349 3.2.4. On Sending a Packet 351 After any packet is sent, be it a new transmission or a rebundled 352 transmission, the following OnPacketSent function is called. The 353 parameters to OnPacketSent are as follows: 355 o packet_number: The packet number of the sent packet. 357 o is_retransmittable: A boolean that indicates whether the packet 358 contains at least one frame requiring reliable deliver. The 359 retransmittability of various QUIC frames is described in 360 [QUIC-TRANSPORT]. If false, it is still acceptable for an ack to 361 be received for this packet. However, a caller MUST NOT set 362 is_retransmittable to true if an ack is not expected. 364 o sent_bytes: The number of bytes sent in the packet. 366 Pseudocode for OnPacketSent follows: 368 OnPacketSent(packet_number, is_retransmittable, sent_bytes): 369 time_of_last_sent_packet = now; 370 sent_packets[packet_number].packet_number = packet_number 371 sent_packets[packet_number].time = now 372 if is_retransmittable: 373 sent_packets[packet_number].bytes = sent_bytes 374 SetLossDetectionAlarm() 376 3.2.5. On Ack Receipt 378 When an ack is received, it may acknowledge 0 or more packets. 380 Pseudocode for OnAckReceived and UpdateRtt follow: 382 OnAckReceived(ack): 383 // If the largest acked is newly acked, update the RTT. 384 if (sent_packets[ack.largest_acked]): 385 latest_rtt = now - sent_packets[ack.largest_acked].time 386 if (latest_rtt > ack.ack_delay): 387 latest_rtt -= ack.delay 388 UpdateRtt(latest_rtt) 389 // Find all newly acked packets. 390 for acked_packet in DetermineNewlyAckedPackets(): 391 OnPacketAcked(acked_packet.packet_number) 393 DetectLostPackets(ack.largest_acked_packet) 394 SetLossDetectionAlarm() 396 UpdateRtt(latest_rtt): 397 // Based on {{RFC6298}}. 398 if (smoothed_rtt == 0): 399 smoothed_rtt = latest_rtt 400 rttvar = latest_rtt / 2 401 else: 402 rttvar = 3/4 * rttvar + 1/4 * (smoothed_rtt - latest_rtt) 403 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt 405 3.2.6. On Packet Acknowledgment 407 When a packet is acked for the first time, the following 408 OnPacketAcked function is called. Note that a single ACK frame may 409 newly acknowledge several packets. OnPacketAcked must be called once 410 for each of these newly acked packets. 412 OnPacketAcked takes one parameter, acked_packet, which is the packet 413 number of the newly acked packet, and returns a list of packet 414 numbers that are detected as lost. 416 If this is the first acknowledgement following RTO, check if the 417 smallest newly acknowledged packet is one sent by the RTO, and if so, 418 inform congestion control of a verified RTO, similar to F-RTO 419 [RFC5682] 421 Pseudocode for OnPacketAcked follows: 423 OnPacketAcked(acked_packet_number): 424 // If a packet sent prior to RTO was acked, then the RTO 425 // was spurious. Otherwise, inform congestion control. 426 if (rto_count > 0 && 427 acked_packet_number > largest_sent_before_rto) 428 OnRetransmissionTimeoutVerified() 429 handshake_count = 0 430 tlp_count = 0 431 rto_count = 0 432 sent_packets.remove(acked_packet_number) 434 3.2.7. Setting the Loss Detection Alarm 436 QUIC loss detection uses a single alarm for all timer-based loss 437 detection. The duration of the alarm is based on the alarm's mode, 438 which is set in the packet and timer events further below. The 439 function SetLossDetectionAlarm defined below shows how the single 440 timer is set based on the alarm mode. 442 3.2.7.1. Handshake Packets 444 The initial flight has no prior RTT sample. A client SHOULD remember 445 the previous RTT it observed when resumption is attempted and use 446 that for an initial RTT value. If no previous RTT is available, the 447 initial RTT defaults to 200ms. 449 Endpoints MUST retransmit handshake frames if not acknowledged within 450 a time limit. This time limit will start as the largest of twice the 451 rtt value and MinTLPTimeout. Each consecutive handshake 452 retransmission doubles the time limit, until an acknowledgement is 453 received. 455 Handshake frames may be cancelled by handshake state transitions. In 456 particular, all non-protected frames SHOULD be no longer be 457 transmitted once packet protection is available. 459 When stateless rejects are in use, the connection is considered 460 immediately closed once a reject is sent, so no timer is set to 461 retransmit the reject. 463 Version negotiation packets are always stateless, and MUST be sent 464 once per per handshake packet that uses an unsupported QUIC version, 465 and MAY be sent in response to 0RTT packets. 467 3.2.7.2. Tail Loss Probe and Retransmission Timeout 469 Tail loss probes [LOSS-PROBE] and retransmission timeouts [RFC6298] 470 are an alarm based mechanism to recover from cases when there are 471 outstanding retransmittable packets, but an acknowledgement has not 472 been received in a timely manner. 474 3.2.7.3. Early Retransmit 476 Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It 477 is part of QUIC's time based loss detection, but is always enabled, 478 even when only packet reordering loss detection is enabled. 480 3.2.7.4. Pseudocode 482 Pseudocode for SetLossDetectionAlarm follows: 484 SetLossDetectionAlarm(): 485 if (retransmittable packets are not outstanding): 486 loss_detection_alarm.cancel() 487 return 489 if (handshake packets are outstanding): 490 // Handshake retransmission alarm. 491 if (smoothed_rtt == 0): 492 alarm_duration = 2 * kDefaultInitialRtt 493 else: 494 alarm_duration = 2 * smoothed_rtt 495 alarm_duration = max(alarm_duration, kMinTLPTimeout) 496 alarm_duration = alarm_duration * (2 ^ handshake_count) 497 else if (loss_time != 0): 498 // Early retransmit timer or time loss detection. 499 alarm_duration = loss_time - now 500 else if (tlp_count < kMaxTLPs): 501 // Tail Loss Probe 502 if (retransmittable_packets_outstanding = 1): 503 alarm_duration = 1.5 * smoothed_rtt + kDelayedAckTimeout 504 else: 505 alarm_duration = kMinTLPTimeout 506 alarm_duration = max(alarm_duration, 2 * smoothed_rtt) 507 else: 508 // RTO alarm 509 alarm_duration = smoothed_rtt + 4 * rttvar 510 alarm_duration = max(alarm_duration, kMinRTOTimeout) 511 alarm_duration = alarm_duration * (2 ^ rto_count) 513 loss_detection_alarm.set(now + alarm_duration) 515 3.2.8. On Alarm Firing 517 QUIC uses one loss recovery alarm, which when set, can be in one of 518 several modes. When the alarm fires, the mode determines the action 519 to be performed. 521 Pseudocode for OnLossDetectionAlarm follows: 523 OnLossDetectionAlarm(): 524 if (handshake packets are outstanding): 525 // Handshake retransmission alarm. 526 RetransmitAllHandshakePackets() 527 handshake_count++ 528 else if (loss_time != 0): 529 // Early retransmit or Time Loss Detection 530 DetectLostPackets(largest_acked_packet) 531 else if (tlp_count < kMaxTLPs): 532 // Tail Loss Probe. 533 SendOnePacket() 534 tlp_count++ 535 else: 536 // RTO. 537 if (rto_count == 0) 538 largest_sent_before_rto = largest_sent_packet 539 SendTwoPackets() 540 rto_count++ 542 SetLossDetectionAlarm() 544 3.2.9. Detecting Lost Packets 546 Packets in QUIC are only considered lost once a larger packet number 547 is acknowledged. DetectLostPackets is called every time an ack is 548 received. If the loss detection alarm fires and the loss_time is 549 set, the previous largest acked packet is supplied. 551 3.2.9.1. Handshake Packets 553 The receiver MUST ignore unprotected packets that ack protected 554 packets. The receiver MUST trust protected acks for unprotected 555 packets, however. Aside from this, loss detection for handshake 556 packets when an ack is processed is identical to other packets. 558 3.2.9.2. Pseudocode 560 DetectLostPackets takes one parameter, acked, which is the largest 561 acked packet. 563 Pseudocode for DetectLostPackets follows: 565 DetectLostPackets(largest_acked): 566 loss_time = 0 567 lost_packets = {} 568 delay_until_lost = infinite 569 if (time_reordering_fraction != infinite): 570 delay_until_lost = 571 (1 + time_reordering_fraction) * max(latest_rtt, smoothed_rtt) 572 else if (largest_acked.packet_number == largest_sent_packet): 573 // Early retransmit alarm. 574 delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) 575 foreach (unacked < largest_acked.packet_number): 576 time_since_sent = now() - unacked.time_sent 577 packet_delta = largest_acked.packet_number - unacked.packet_number 578 if (time_since_sent > delay_until_lost): 579 lost_packets.insert(unacked) 580 else if (packet_delta > reordering_threshold) 581 lost_packets.insert(unacked) 582 else if (loss_time == 0 && delay_until_lost != infinite): 583 loss_time = now() + delay_until_lost - time_since_sent 585 // Inform the congestion controller of lost packets and 586 // lets it decide whether to retransmit immediately. 587 if (!lost_packets.empty()) 588 OnPacketsLost(lost_packets) 589 foreach (packet in lost_packets) 590 sent_packets.remove(packet.packet_number) 592 3.3. Discussion 594 The majority of constants were derived from best common practices 595 among widely deployed TCP implementations on the internet. 596 Exceptions follow. 598 A shorter delayed ack time of 25ms was chosen because longer delayed 599 acks can delay loss recovery and for the small number of connections 600 where less than packet per 25ms is delivered, acking every packet is 601 beneficial to congestion control and loss recovery. 603 The default initial RTT of 100ms was chosen because it is slightly 604 higher than both the median and mean min_rtt typically observed on 605 the public internet. 607 4. Congestion Control 609 QUIC's congestion control is based on TCP NewReno[RFC6582] congestion 610 control to determine the congestion window and pacing rate. 612 4.1. Slow Start 614 QUIC begins every connection in slow start and exits slow start upon 615 loss. While in slow start, QUIC increases the congestion window by 616 the number of acknowledged bytes when each ack is processed. 618 4.2. Recovery 620 Recovery is a period of time beginning with detection of a lost 621 packet. It ends when all packets outstanding at the time recovery 622 began have been acknowledged or lost. During recovery, the 623 congestion window is not increased or decreased. 625 4.3. Constants of interest 627 Constants used in congestion control are based on a combination of 628 RFCs, papers, and common practice. Some may need to be changed or 629 negotiated in order to better suit a variety of environments. 631 kDefaultMss (default 1460 bytes): The default max packet size used 632 for calculating default and minimum congestion windows. 634 kInitialWindow (default 10 * kDefaultMss): Default limit on the 635 amount of outstanding data in bytes. 637 kMinimumWindow (default 2 * kDefaultMss): Default minimum congestion 638 window. 640 kLossReductionFactor (default 0.5): Reduction in congestion window 641 when a new loss event is detected. 643 4.4. Variables of interest 645 Variables required to implement the congestion control mechanisms are 646 described in this section. 648 bytes_in_flight: The sum of the size in bytes of all sent packets 649 that contain at least one retransmittable frame, and have not been 650 acked or declared lost. 652 congestion_window: Maximum number of bytes in flight that may be 653 sent. 655 end_of_recovery: The packet number after which QUIC will no longer 656 be in recovery. 658 ssthresh Slow start threshold in bytes. When the congestion window 659 is below ssthresh, it grows by the number of bytes acknowledged 660 for each ack. 662 4.5. Initialization 664 At the beginning of the connection, initialize the loss detection 665 variables as follows: 667 congestion_window = kInitialWindow 668 bytes_in_flight = 0 669 end_of_recovery = 0 670 ssthresh = infinite 672 4.6. On Packet Acknowledgement 674 Invoked at the same time loss detection's OnPacketAcked is called and 675 supplied with the acked_packet from sent_packets. 677 Pseudocode for OnPacketAcked follows: 679 OnPacketAcked(acked_packet): 680 if (acked_packet.packet_number < end_of_recovery): 681 return 682 if (congestion_window < ssthresh): 683 congestion_window += acket_packets.bytes 684 else: 685 congestion_window += 686 acked_packets.bytes / congestion_window 688 4.7. On Packets Lost 690 Invoked by loss detection from DetectLostPackets when new packets are 691 detected lost. 693 OnPacketsLost(lost_packets): 694 largest_lost_packet = lost_packets.last() 695 // Start a new recovery epoch if the lost packet is larger 696 // than the end of the previous recovery epoch. 697 if (end_of_recovery < largest_lost_packet.packet_number): 698 end_of_recovery = largest_sent_packet 699 congestion_window *= kLossReductionFactor 700 congestion_window = max(congestion_window, kMinimumWindow) 701 ssthresh = congestion_window 703 4.8. On Retransmission Timeout Verified 705 QUIC decreases the congestion window to the minimum value once the 706 retransmission timeout has been confirmed to not be spurious when the 707 first post-RTO acknowledgement is processed. 709 OnRetransmissionTimeoutVerified() 710 congestion_window = kMinimumWindow 712 4.9. Pacing Packets 714 QUIC sends a packet if there is available congestion window and 715 sending the packet does not exceed the pacing rate. 717 TimeToSend returns infinite if the congestion controller is 718 congestion window limited, a time in the past if the packet can be 719 sent immediately, and a time in the future if sending is pacing 720 limited. 722 TimeToSend(packet_size): 723 if (bytes_in_flight + packet_size > congestion_window) 724 return infinite 725 return time_of_last_sent_packet + 726 (packet_size * smoothed_rtt) / congestion_window 728 5. IANA Considerations 730 This document has no IANA actions. Yet. 732 6. References 734 6.1. Normative References 736 [QUIC-TRANSPORT] 737 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 738 Multiplexed and Secure Transport", draft-ietf-quic- 739 transport (work in progress), June 2017. 741 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 742 Requirement Levels", BCP 14, RFC 2119, 743 DOI 10.17487/RFC2119, March 1997, 744 . 746 6.2. Informative References 748 [LOSS-PROBE] 749 Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 750 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 751 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 752 in progress), February 2013. 754 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 755 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 756 Spurious Retransmission Timeouts with TCP", RFC 5682, 757 DOI 10.17487/RFC5682, September 2009, 758 . 760 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 761 P. Hurtig, "Early Retransmit for TCP and Stream Control 762 Transmission Protocol (SCTP)", RFC 5827, 763 DOI 10.17487/RFC5827, May 2010, 764 . 766 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 767 "Computing TCP's Retransmission Timer", RFC 6298, 768 DOI 10.17487/RFC6298, June 2011, 769 . 771 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 772 NewReno Modification to TCP's Fast Recovery Algorithm", 773 RFC 6582, DOI 10.17487/RFC6582, April 2012, 774 . 776 Appendix A. Acknowledgments 778 Appendix B. Change Log 780 *RFC Editor's Note:* Please remove this section prior to 781 publication of a final version of this document. 783 B.1. Since draft-ietf-quic-recovery-02 785 o Integrate F-RTO (#544, #409) 787 o Add congestion control (#545, #395) 789 o Require connection abort if a skipped packet was acknowledged 790 (#415) 792 o Simplify RTO calculations (#142, #417) 794 B.2. Since draft-ietf-quic-recovery-01 796 o Overview added to loss detection 798 o Changes initial default RTT to 100ms 800 o Added time-based loss detection and fixes early retransmit 802 o Clarified loss recovery for handshake packets 804 o Fixed references and made TCP references informative 806 B.3. Since draft-ietf-quic-recovery-00 808 o Improved description of constants and ACK behavior 810 B.4. Since draft-iyengar-quic-loss-recovery-01 812 o Adopted as base for draft-ietf-quic-recovery 814 o Updated authors/editors list 816 o Added table of contents 818 Authors' Addresses 820 Jana Iyengar (editor) 821 Google 823 Email: jri@google.com 825 Ian Swett (editor) 826 Google 828 Email: ianswett@google.com