idnits 2.17.1 draft-ietf-quic-recovery-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (May 21, 2017) is 2504 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC J. Iyengar, Ed. 3 Internet-Draft I. Swett, Ed. 4 Intended status: Standards Track Google 5 Expires: November 22, 2017 May 21, 2017 7 QUIC Loss Detection and Congestion Control 8 draft-ietf-quic-recovery-03 10 Abstract 12 This document describes loss detection and congestion control 13 mechanisms for QUIC. 15 Note to Readers 17 Discussion of this draft takes place on the QUIC working group 18 mailing list (quic@ietf.org), which is archived at 19 https://mailarchive.ietf.org/arch/search/?email_list=quic . 21 Working Group information can be found at https://github.com/quicwg ; 22 source code and issues list for this draft can be found at 23 https://github.com/quicwg/base-drafts/labels/recovery . 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on November 22, 2017. 42 Copyright Notice 44 Copyright (c) 2017 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 61 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 3 62 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 4 63 2.1.1. Monotonically Increasing Packet Numbers . . . . . . . 4 64 2.1.2. No Reneging . . . . . . . . . . . . . . . . . . . . . 4 65 2.1.3. More ACK Ranges . . . . . . . . . . . . . . . . . . . 5 66 2.1.4. Explicit Correction For Delayed Acks . . . . . . . . 5 67 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 5 68 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 5 69 3.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 6 70 3.2.1. Constants of interest . . . . . . . . . . . . . . . . 6 71 3.2.2. Variables of interest . . . . . . . . . . . . . . . . 6 72 3.2.3. Initialization . . . . . . . . . . . . . . . . . . . 7 73 3.2.4. On Sending a Packet . . . . . . . . . . . . . . . . . 8 74 3.2.5. On Ack Receipt . . . . . . . . . . . . . . . . . . . 8 75 3.2.6. On Packet Acknowledgment . . . . . . . . . . . . . . 9 76 3.2.7. Setting the Loss Detection Alarm . . . . . . . . . . 10 77 3.2.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 12 78 3.2.9. Detecting Lost Packets . . . . . . . . . . . . . . . 12 79 3.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . 13 80 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 14 81 4.1. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 14 82 4.2. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 14 83 4.3. Constants of interest . . . . . . . . . . . . . . . . . . 14 84 4.4. Variables of interest . . . . . . . . . . . . . . . . . . 14 85 4.5. Initialization . . . . . . . . . . . . . . . . . . . . . 15 86 4.6. On Packet Acknowledgement . . . . . . . . . . . . . . . . 15 87 4.7. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 15 88 4.8. On Retransmission Timeout Verified . . . . . . . . . . . 16 89 4.9. Pacing Packets . . . . . . . . . . . . . . . . . . . . . 16 90 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 91 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 92 6.1. Normative References . . . . . . . . . . . . . . . . . . 16 93 6.2. Informative References . . . . . . . . . . . . . . . . . 16 94 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 17 95 Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 17 96 B.1. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 17 97 B.2. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 18 98 B.3. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 18 99 B.4. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 18 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 102 1. Introduction 104 QUIC is a new multiplexed and secure transport atop UDP. QUIC builds 105 on decades of transport and security experience, and implements 106 mechanisms that make it attractive as a modern general-purpose 107 transport. The QUIC protocol is described in [QUIC-TRANSPORT]. 109 QUIC implements the spirit of known TCP loss recovery mechanisms, 110 described in RFCs, various Internet-drafts, and also those prevalent 111 in the Linux TCP implementation. This document describes QUIC 112 congestion control and loss recovery, and where applicable, 113 attributes the TCP equivalent in RFCs, Internet-drafts, academic 114 papers, and/or TCP implementations. 116 1.1. Notational Conventions 118 The words "MUST", "MUST NOT", "SHOULD", and "MAY" are used in this 119 document. It's not shouting; when they are capitalized, they have 120 the special meaning defined in [RFC2119]. 122 2. Design of the QUIC Transmission Machinery 124 All transmissions in QUIC are sent with a packet-level header, which 125 includes a packet sequence number (referred to below as a packet 126 number). These packet numbers never repeat in the lifetime of a 127 connection, and are monotonically increasing, which makes duplicate 128 detection trivial. This fundamental design decision obviates the 129 need for disambiguating between transmissions and retransmissions and 130 eliminates significant complexity from QUIC's interpretation of TCP 131 loss detection mechanisms. 133 Every packet may contain several frames. We outline the frames that 134 are important to the loss detection and congestion control machinery 135 below. 137 o Retransmittable frames are frames requiring reliable delivery. 138 The most common are STREAM frames, which typically contain 139 application data. 141 o Crypto handshake data is sent on stream 0, and uses the 142 reliability machinery of QUIC underneath. 144 o ACK frames contain acknowledgment information. QUIC uses a SACK- 145 based scheme, where acks express up to 256 ranges. The ACK frame 146 also includes a receive timestamp for each packet newly acked. 148 2.1. Relevant Differences Between QUIC and TCP 150 Readers familiar with TCP's loss detection and congestion control 151 will find algorithms here that parallel well-known TCP ones. 152 Protocol differences between QUIC and TCP however contribute to 153 algorithmic differences. We briefly describe these protocol 154 differences below. 156 2.1.1. Monotonically Increasing Packet Numbers 158 TCP conflates transmission sequence number at the sender with 159 delivery sequence number at the receiver, which results in 160 retransmissions of the same data carrying the same sequence number, 161 and consequently to problems caused by "retransmission ambiguity". 162 QUIC separates the two: QUIC uses a packet sequence number (referred 163 to as the "packet number") for transmissions, and any data that is to 164 be delivered to the receiving application(s) is sent in one or more 165 streams, with stream offsets encoded within STREAM frames inside of 166 packets that determine delivery order. 168 QUIC's packet number is strictly increasing, and directly encodes 169 transmission order. A higher QUIC packet number signifies that the 170 packet was sent later, and a lower QUIC packet number signifies that 171 the packet was sent earlier. When a packet containing frames is 172 deemed lost, QUIC rebundles necessary frames in a new packet with a 173 new packet number, removing ambiguity about which packet is 174 acknowledged when an ACK is received. Consequently, more accurate 175 RTT measurements can be made, spurious retransmissions are trivially 176 detected, and mechanisms such as Fast Retransmit can be applied 177 universally, based only on packet number. 179 This design point significantly simplifies loss detection mechanisms 180 for QUIC. Most TCP mechanisms implicitly attempt to infer 181 transmission ordering based on TCP sequence numbers - a non-trivial 182 task, especially when TCP timestamps are not available. 184 2.1.2. No Reneging 186 QUIC ACKs contain information that is equivalent to TCP SACK, but 187 QUIC does not allow any acked packet to be reneged, greatly 188 simplifying implementations on both sides and reducing memory 189 pressure on the sender. 191 2.1.3. More ACK Ranges 193 QUIC supports up to 256 ACK ranges, opposed to TCP's 3 SACK ranges. 194 In high loss environments, this speeds recovery. 196 2.1.4. Explicit Correction For Delayed Acks 198 QUIC ACKs explicitly encode the delay incurred at the receiver 199 between when a packet is received and when the corresponding ACK is 200 sent. This allows the receiver of the ACK to adjust for receiver 201 delays, specifically the delayed ack timer, when estimating the path 202 RTT. This mechanism also allows a receiver to measure and report the 203 delay from when a packet was received by the OS kernel, which is 204 useful in receivers which may incur delays such as context-switch 205 latency before a userspace QUIC receiver processes a received packet. 207 3. Loss Detection 209 3.1. Overview 211 QUIC uses a combination of ack information and alarms to detect lost 212 packets. An unacknowledged QUIC packet is marked as lost in one of 213 the following ways: 215 o A packet is marked as lost if at least one packet that was sent a 216 threshold number of packets (kReorderingThreshold) after it has 217 been acknowledged. This indicates that the unacknowledged packet 218 is either lost or reordered beyond the specified threshold. This 219 mechanism combines both TCP's FastRetransmit and FACK mechanisms. 221 o If a packet is near the tail, where fewer than 222 kReorderingThreshold packets are sent after it, the sender cannot 223 expect to detect loss based on the previous mechanism. In this 224 case, a sender uses both ack information and an alarm to detect 225 loss. Specifically, when the last sent packet is acknowledged, 226 the sender waits a short period of time to allow for reordering 227 and then marks any unacknowledged packets as lost. This mechanism 228 is based on the Linux implementation of TCP Early Retransmit. 230 o If a packet is sent at the tail, there are no packets sent after 231 it, and the sender cannot use ack information to detect its loss. 232 The sender therefore relies on an alarm to detect such tail 233 losses. This mechanism is based on TCP's Tail Loss Probe. 235 o If all else fails, a Retransmission Timeout (RTO) alarm is always 236 set when any retransmittable packet is outstanding. When this 237 alarm fires, all unacknowledged packets are marked as lost. 239 o Instead of a packet threshold to tolerate reordering, a QUIC 240 sender may use a time threshold. This allows for senders to be 241 tolerant of short periods of significant reordering. In this 242 mechanism, a QUIC sender marks a packet as lost when a packet 243 larger than it is acknowledged and a threshold amount of time has 244 passed since the packet was sent. 246 o Handshake packets, which contain STREAM frames for stream 0, are 247 critical to QUIC transport and crypto negotiation, so a separate 248 alarm period is used for them. 250 3.2. Algorithm Details 252 3.2.1. Constants of interest 254 Constants used in loss recovery are based on a combination of RFCs, 255 papers, and common practice. Some may need to be changed or 256 negotiated in order to better suit a variety of environments. 258 kMaxTLPs (default 2): Maximum number of tail loss probes before an 259 RTO fires. 261 kReorderingThreshold (default 3): Maximum reordering in packet 262 number space before FACK style loss detection considers a packet 263 lost. 265 kTimeReorderingFraction (default 1/8): Maximum reordering in time 266 space before time based loss detection considers a packet lost. 267 In fraction of an RTT. 269 kMinTLPTimeout (default 10ms): Minimum time in the future a tail 270 loss probe alarm may be set for. 272 kMinRTOTimeout (default 200ms): Minimum time in the future an RTO 273 alarm may be set for. 275 kDelayedAckTimeout (default 25ms): The length of the peer's delayed 276 ack timer. 278 kDefaultInitialRtt (default 100ms): The default RTT used before an 279 RTT sample is taken. 281 3.2.2. Variables of interest 283 Variables required to implement the congestion control mechanisms are 284 described in this section. 286 loss_detection_alarm: Multi-modal alarm used for loss detection. 288 handshake_count: The number of times the handshake packets have been 289 retransmitted without receiving an ack. 291 tlp_count: The number of times a tail loss probe has been sent 292 without receiving an ack. 294 rto_count: The number of times an rto has been sent without 295 receiving an ack. 297 largest_sent_before_rto: The last packet number sent prior to the 298 first retransmission timeout. 300 smoothed_rtt: The smoothed RTT of the connection, computed as 301 described in [RFC6298] 303 rttvar: The RTT variance, computed as described in [RFC6298] 305 reordering_threshold: The largest delta between the largest acked 306 retransmittable packet and a packet containing retransmittable 307 frames before it's declared lost. 309 time_reordering_fraction: The reordering window as a fraction of 310 max(smoothed_rtt, latest_rtt). 312 loss_time: The time at which the next packet will be considered lost 313 based on early transmit or exceeding the reordering window in 314 time. 316 sent_packets: An association of packet numbers to information about 317 them, including a number field indicating the packet number, a 318 time field indicating the time a packet was sent, and a bytes 319 field indicating the packet's size. sent_packets is ordered by 320 packet number, and packets remain in sent_packets until 321 acknowledged or lost. 323 3.2.3. Initialization 325 At the beginning of the connection, initialize the loss detection 326 variables as follows: 328 loss_detection_alarm.reset() 329 handshake_count = 0 330 tlp_count = 0 331 rto_count = 0 332 if (UsingTimeLossDetection()) 333 reordering_threshold = infinite 334 time_reordering_fraction = kTimeReorderingFraction 335 else: 336 reordering_threshold = kReorderingThreshold 337 time_reordering_fraction = infinite 338 loss_time = 0 339 smoothed_rtt = 0 340 rttvar = 0 341 largest_sent_before_rto = 0 343 3.2.4. On Sending a Packet 345 After any packet is sent, be it a new transmission or a rebundled 346 transmission, the following OnPacketSent function is called. The 347 parameters to OnPacketSent are as follows: 349 o packet_number: The packet number of the sent packet. 351 o is_retransmittable: A boolean that indicates whether the packet 352 contains at least one frame requiring reliable deliver. The 353 retransmittability of various QUIC frames is described in 354 [QUIC-TRANSPORT]. If false, it is still acceptable for an ack to 355 be received for this packet. However, a caller MUST NOT set 356 is_retransmittable to true if an ack is not expected. 358 o sent_bytes: The number of bytes sent in the packet. 360 Pseudocode for OnPacketSent follows: 362 OnPacketSent(packet_number, is_retransmittable, sent_bytes): 363 sent_packets[packet_number].packet_number = packet_number 364 sent_packets[packet_number].time = now 365 if is_retransmittable: 366 sent_packets[packet_number].bytes = sent_bytes 367 SetLossDetectionAlarm() 369 3.2.5. On Ack Receipt 371 When an ack is received, it may acknowledge 0 or more packets. 373 The sender MUST abort the connection if it receives an ACK for a 374 packet it never sent, see [QUIC-TRANSPORT]. 376 Pseudocode for OnAckReceived and UpdateRtt follow: 378 OnAckReceived(ack): 379 // If the largest acked is newly acked, update the RTT. 380 if (sent_packets[ack.largest_acked]): 381 rtt_sample = now - sent_packets[ack.largest_acked].time 382 if (rtt_sample > ack.ack_delay): 383 rtt_sample -= ack.delay 384 UpdateRtt(rtt_sample) 385 // The sender may skip packets for detecting optimistic ACKs 386 if (packets acked that the sender skipped): 387 abortConnection() 388 // Find all newly acked packets. 389 for acked_packet in DetermineNewlyAckedPackets(): 390 OnPacketAcked(acked_packet.packet_number) 392 DetectLostPackets(ack.largest_acked_packet) 393 SetLossDetectionAlarm() 395 UpdateRtt(rtt_sample): 396 // Based on {{RFC6298}}. 397 if (smoothed_rtt == 0): 398 smoothed_rtt = rtt_sample 399 rttvar = rtt_sample / 2 400 else: 401 rttvar = 3/4 * rttvar + 1/4 * (smoothed_rtt - rtt_sample) 402 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * rtt_sample 404 3.2.6. On Packet Acknowledgment 406 When a packet is acked for the first time, the following 407 OnPacketAcked function is called. Note that a single ACK frame may 408 newly acknowledge several packets. OnPacketAcked must be called once 409 for each of these newly acked packets. 411 OnPacketAcked takes one parameter, acked_packet, which is the packet 412 number of the newly acked packet, and returns a list of packet 413 numbers that are detected as lost. 415 If this is the first acknowledgement following RTO, check if the 416 smallest newly acknowledged packet is one sent by the RTO, and if so, 417 inform congestion control of a verified RTO, similar to F-RTO 418 [RFC5682] 420 Pseudocode for OnPacketAcked follows: 422 OnPacketAcked(acked_packet_number): 423 // If a packet sent prior to RTO was acked, then the RTO 424 // was spurious. Otherwise, inform congestion control. 425 if (rto_count > 0 && 426 acked_packet_number > largest_sent_before_rto) 427 OnRetransmissionTimeoutVerified() 428 handshake_count = 0 429 tlp_count = 0 430 rto_count = 0 431 sent_packets.remove(acked_packet_number) 433 3.2.7. Setting the Loss Detection Alarm 435 QUIC loss detection uses a single alarm for all timer-based loss 436 detection. The duration of the alarm is based on the alarm's mode, 437 which is set in the packet and timer events further below. The 438 function SetLossDetectionAlarm defined below shows how the single 439 timer is set based on the alarm mode. 441 3.2.7.1. Handshake Packets 443 The initial flight has no prior RTT sample. A client SHOULD remember 444 the previous RTT it observed when resumption is attempted and use 445 that for an initial RTT value. If no previous RTT is available, the 446 initial RTT defaults to 200ms. 448 Endpoints MUST retransmit handshake frames if not acknowledged within 449 a time limit. This time limit will start as the largest of twice the 450 rtt value and MinTLPTimeout. Each consecutive handshake 451 retransmission doubles the time limit, until an acknowledgement is 452 received. 454 Handshake frames may be cancelled by handshake state transitions. In 455 particular, all non-protected frames SHOULD be no longer be 456 transmitted once packet protection is available. 458 When stateless rejects are in use, the connection is considered 459 immediately closed once a reject is sent, so no timer is set to 460 retransmit the reject. 462 Version negotiation packets are always stateless, and MUST be sent 463 once per per handshake packet that uses an unsupported QUIC version, 464 and MAY be sent in response to 0RTT packets. 466 3.2.7.2. Tail Loss Probe and Retransmission Timeout 468 Tail loss probes [LOSS-PROBE] and retransmission timeouts [RFC6298] 469 are an alarm based mechanism to recover from cases when there are 470 outstanding retransmittable packets, but an acknowledgement has not 471 been received in a timely manner. 473 3.2.7.3. Early Retransmit 475 Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It 476 is part of QUIC's time based loss detection, but is always enabled, 477 even when only packet reordering loss detection is enabled. 479 3.2.7.4. Pseudocode 481 Pseudocode for SetLossDetectionAlarm follows: 483 SetLossDetectionAlarm(): 484 if (retransmittable packets are not outstanding): 485 loss_detection_alarm.cancel() 486 return 488 if (handshake packets are outstanding): 489 // Handshake retransmission alarm. 490 if (smoothed_rtt == 0): 491 alarm_duration = 2 * kDefaultInitialRtt 492 else: 493 alarm_duration = 2 * smoothed_rtt 494 alarm_duration = max(alarm_duration, kMinTLPTimeout) 495 alarm_duration = alarm_duration * (2 ^ handshake_count) 496 else if (loss_time != 0): 497 // Early retransmit timer or time loss detection. 498 alarm_duration = loss_time - now 499 else if (tlp_count < kMaxTLPs): 500 // Tail Loss Probe 501 if (retransmittable_packets_outstanding = 1): 502 alarm_duration = 1.5 * smoothed_rtt + kDelayedAckTimeout 503 else: 504 alarm_duration = kMinTLPTimeout 505 alarm_duration = max(alarm_duration, 2 * smoothed_rtt) 506 else: 507 // RTO alarm 508 alarm_duration = smoothed_rtt + 4 * rttvar 509 alarm_duration = max(alarm_duration, kMinRTOTimeout) 510 alarm_duration = alarm_duration * (2 ^ rto_count) 512 loss_detection_alarm.set(now + alarm_duration) 514 3.2.8. On Alarm Firing 516 QUIC uses one loss recovery alarm, which when set, can be in one of 517 several modes. When the alarm fires, the mode determines the action 518 to be performed. 520 Pseudocode for OnLossDetectionAlarm follows: 522 OnLossDetectionAlarm(): 523 if (handshake packets are outstanding): 524 // Handshake retransmission alarm. 525 RetransmitAllHandshakePackets() 526 handshake_count++ 527 else if (loss_time != 0): 528 // Early retransmit or Time Loss Detection 529 DetectLostPackets(largest_acked_packet) 530 else if (tlp_count < kMaxTLPs): 531 // Tail Loss Probe. 532 SendOnePacket() 533 tlp_count++ 534 else: 535 // RTO. 536 if (rto_count == 0) 537 largest_sent_before_rto = largest_sent_packet 538 SendTwoPackets() 539 rto_count++ 541 SetLossDetectionAlarm() 543 3.2.9. Detecting Lost Packets 545 Packets in QUIC are only considered lost once a larger packet number 546 is acknowledged. DetectLostPackets is called every time an ack is 547 received. If the loss detection alarm fires and the loss_time is 548 set, the previous largest acked packet is supplied. 550 3.2.9.1. Handshake Packets 552 The receiver MUST ignore unprotected packets that ack protected 553 packets. The receiver MUST trust protected acks for unprotected 554 packets, however. Aside from this, loss detection for handshake 555 packets when an ack is processed is identical to other packets. 557 3.2.9.2. Pseudocode 559 DetectLostPackets takes one parameter, acked, which is the largest 560 acked packet. 562 Pseudocode for DetectLostPackets follows: 564 DetectLostPackets(largest_acked): 565 loss_time = 0 566 lost_packets = {} 567 delay_until_lost = infinite 568 if (time_reordering_fraction != infinite): 569 delay_until_lost = 570 (1 + time_reordering_fraction) * max(latest_rtt, smoothed_rtt) 571 else if (largest_acked.packet_number == largest_sent_packet): 572 // Early retransmit alarm. 573 delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) 574 foreach (unacked < largest_acked.packet_number): 575 time_since_sent = now() - unacked.time_sent 576 packet_delta = largest_acked.packet_number - unacked.packet_number 577 if (time_since_sent > delay_until_lost): 578 lost_packets.insert(unacked) 579 else if (packet_delta > reordering_threshold) 580 lost_packets.insert(unacked) 581 else if (loss_time == 0 && delay_until_lost != infinite): 582 loss_time = now() + delay_until_lost - time_since_sent 584 // Inform the congestion controller of lost packets and 585 // lets it decide whether to retransmit immediately. 586 if (!lost_packets.empty()) 587 OnPacketsLost(lost_packets) 588 foreach (packet in lost_packets) 589 sent_packets.remove(packet.packet_number) 591 3.3. Discussion 593 The majority of constants were derived from best common practices 594 among widely deployed TCP implementations on the internet. 595 Exceptions follow. 597 A shorter delayed ack time of 25ms was chosen because longer delayed 598 acks can delay loss recovery and for the small number of connections 599 where less than packet per 25ms is delivered, acking every packet is 600 beneficial to congestion control and loss recovery. 602 The default initial RTT of 100ms was chosen because it is slightly 603 higher than both the median and mean min_rtt typically observed on 604 the public internet. 606 4. Congestion Control 608 QUIC's congestion control is based on TCP NewReno[RFC6582] congestion 609 control to determine the congestion window and pacing rate. 611 4.1. Slow Start 613 QUIC begins every connection in slow start and exits slow start upon 614 loss. While in slow start, QUIC increases the congestion window by 615 the number of acknowledged bytes when each ack is processed. 617 4.2. Recovery 619 Recovery is a period of time beginning with detection of a lost 620 packet. It ends when all packets outstanding at the time recovery 621 began have been acknowledged or lost. During recovery, the 622 congestion window is not increased or decreased. 624 4.3. Constants of interest 626 Constants used in congestion control are based on a combination of 627 RFCs, papers, and common practice. Some may need to be changed or 628 negotiated in order to better suit a variety of environments. 630 kDefaultMss (default 1460 bytes): The default max packet size used 631 for calculating default and minimum congestion windows. 633 kInitialWindow (default 10 * kDefaultMss): Default limit on the 634 amount of outstanding data in bytes. 636 kMinimumWindow (default 2 * kDefaultMss): Default minimum congestion 637 window. 639 kLossReductionFactor (default 0.5): Reduction in congestion window 640 when a new loss event is detected. 642 4.4. Variables of interest 644 Variables required to implement the congestion control mechanisms are 645 described in this section. 647 bytes_in_flight: The sum of the size in bytes of all sent packets 648 that contain at least one retransmittable frame, and have not been 649 acked or declared lost. 651 congestion_window: Maximum number of bytes in flight that may be 652 sent. 654 end_of_recovery: The packet number after which QUIC will no longer 655 be in recovery. 657 ssthresh Slow start threshold in bytes. When the congestion window 658 is below ssthresh, it grows by the number of bytes acknowledged 659 for each ack. 661 4.5. Initialization 663 At the beginning of the connection, initialize the loss detection 664 variables as follows: 666 congestion_window = kInitialWindow 667 bytes_in_flight = 0 668 end_of_recovery = 0 669 ssthresh = infinite 671 4.6. On Packet Acknowledgement 673 Invoked at the same time loss detection's OnPacketAcked is called and 674 supplied with the acked_packet from sent_packets. 676 Pseudocode for OnPacketAcked follows: 678 OnPacketAcked(acked_packet): 679 if (acked_packet.packet_number < end_of_recovery): 680 return 681 if (congestion_window < ssthresh): 682 congestion_window += acket_packets.bytes 683 else: 684 congestion_window += 685 acked_packets.bytes / congestion_window 687 4.7. On Packets Lost 689 Invoked by loss detection from DetectLostPackets when new packets are 690 detected lost. 692 OnPacketsLost(lost_packets): 693 largest_lost_packet = lost_packets.last() 694 // Start a new recovery epoch if the lost packet is larger 695 // than the end of the previous recovery epoch. 696 if (end_of_recovery < largest_lost_packet.packet_number): 697 end_of_recovery = largest_sent_packet 698 congestion_window *= kLossReductionFactor 699 congestion_window = max(congestion_window, kMinimumWindow) 700 ssthresh = congestion_window 702 4.8. On Retransmission Timeout Verified 704 QUIC decreases the congestion window to the minimum value once the 705 retransmission timeout has been confirmed to not be spurious when the 706 first post-RTO acknowledgement is processed. 708 OnRetransmissionTimeoutVerified() 709 congestion_window = kMinimumWindow 711 4.9. Pacing Packets 713 QUIC sends a packet if there is available congestion window and 714 sending the packet does not exceed the pacing rate. 716 TimeToSend returns infinite if the congestion controller is 717 congestion window limited, a time in the past if the packet can be 718 sent immediately, and a time in the future if sending is pacing 719 limited. 721 TimeToSend(packet_size): 722 if (bytes_in_flight + packet_size > congestion_window) 723 return infinite 724 return time_of_last_sent_packet + 725 (packet_size * smoothed_rtt) / congestion_window 727 5. IANA Considerations 729 This document has no IANA actions. Yet. 731 6. References 733 6.1. Normative References 735 [QUIC-TRANSPORT] 736 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 737 Multiplexed and Secure Transport", draft-ietf-quic- 738 transport (work in progress), May 2017. 740 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 741 Requirement Levels", BCP 14, RFC 2119, 742 DOI 10.17487/RFC2119, March 1997, 743 . 745 6.2. Informative References 747 [LOSS-PROBE] 748 Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 749 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 750 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 751 in progress), February 2013. 753 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 754 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 755 Spurious Retransmission Timeouts with TCP", RFC 5682, 756 DOI 10.17487/RFC5682, September 2009, 757 . 759 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 760 P. Hurtig, "Early Retransmit for TCP and Stream Control 761 Transmission Protocol (SCTP)", RFC 5827, 762 DOI 10.17487/RFC5827, May 2010, 763 . 765 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 766 "Computing TCP's Retransmission Timer", RFC 6298, 767 DOI 10.17487/RFC6298, June 2011, 768 . 770 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 771 NewReno Modification to TCP's Fast Recovery Algorithm", 772 RFC 6582, DOI 10.17487/RFC6582, April 2012, 773 . 775 Appendix A. Acknowledgments 777 Appendix B. Change Log 779 *RFC Editor's Note:* Please remove this section prior to 780 publication of a final version of this document. 782 B.1. Since draft-ietf-quic-recovery-02 784 o Integrate F-RTO (#544, #409) 786 o Add congestion control (#545, #395) 788 o Require connection abort if a skipped packet was acknowledged 789 (#415) 791 o Simplify RTO calculations (#142, #417) 793 B.2. Since draft-ietf-quic-recovery-01 795 o Overview added to loss detection 797 o Changes initial default RTT to 100ms 799 o Added time-based loss detection and fixes early retransmit 801 o Clarified loss recovery for handshake packets 803 o Fixed references and made TCP references informative 805 B.3. Since draft-ietf-quic-recovery-00 807 o Improved description of constants and ACK behavior 809 B.4. Since draft-iyengar-quic-loss-recovery-01 811 o Adopted as base for draft-ietf-quic-recovery 813 o Updated authors/editors list 815 o Added table of contents 817 Authors' Addresses 819 Jana Iyengar (editor) 820 Google 822 Email: jri@google.com 824 Ian Swett (editor) 825 Google 827 Email: ianswett@google.com