idnits 2.17.1 draft-ietf-quic-recovery-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (September 22, 2017) is 2406 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC J. Iyengar, Ed. 3 Internet-Draft I. Swett, Ed. 4 Intended status: Standards Track Google 5 Expires: March 26, 2018 September 22, 2017 7 QUIC Loss Detection and Congestion Control 8 draft-ietf-quic-recovery-06 10 Abstract 12 This document describes loss detection and congestion control 13 mechanisms for QUIC. 15 Note to Readers 17 Discussion of this draft takes place on the QUIC working group 18 mailing list (quic@ietf.org), which is archived at 19 https://mailarchive.ietf.org/arch/search/?email_list=quic . 21 Working Group information can be found at https://github.com/quicwg ; 22 source code and issues list for this draft can be found at 23 https://github.com/quicwg/base-drafts/labels/recovery . 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on March 26, 2018. 42 Copyright Notice 44 Copyright (c) 2017 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 61 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 3 62 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 4 63 2.1.1. Monotonically Increasing Packet Numbers . . . . . . . 4 64 2.1.2. No Reneging . . . . . . . . . . . . . . . . . . . . . 5 65 2.1.3. More ACK Ranges . . . . . . . . . . . . . . . . . . . 5 66 2.1.4. Explicit Correction For Delayed Acks . . . . . . . . 5 67 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 5 68 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 5 69 3.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 6 70 3.2.1. Constants of interest . . . . . . . . . . . . . . . . 6 71 3.2.2. Variables of interest . . . . . . . . . . . . . . . . 7 72 3.2.3. Initialization . . . . . . . . . . . . . . . . . . . 8 73 3.2.4. On Sending a Packet . . . . . . . . . . . . . . . . . 8 74 3.2.5. On Ack Receipt . . . . . . . . . . . . . . . . . . . 9 75 3.2.6. On Packet Acknowledgment . . . . . . . . . . . . . . 9 76 3.2.7. Setting the Loss Detection Alarm . . . . . . . . . . 10 77 3.2.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 12 78 3.2.9. Detecting Lost Packets . . . . . . . . . . . . . . . 13 79 3.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . 14 80 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 14 81 4.1. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 15 82 4.2. Congestion Avoidance . . . . . . . . . . . . . . . . . . 15 83 4.3. Recovery Period . . . . . . . . . . . . . . . . . . . . . 15 84 4.4. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 15 85 4.5. Retransmission Timeout . . . . . . . . . . . . . . . . . 15 86 4.6. Pacing Rate . . . . . . . . . . . . . . . . . . . . . . . 16 87 4.7. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 16 88 4.7.1. Constants of interest . . . . . . . . . . . . . . . . 16 89 4.7.2. Variables of interest . . . . . . . . . . . . . . . . 16 90 4.7.3. Initialization . . . . . . . . . . . . . . . . . . . 17 91 4.7.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 17 92 4.7.5. On Packet Acknowledgement . . . . . . . . . . . . . . 17 93 4.7.6. On Packets Lost . . . . . . . . . . . . . . . . . . . 17 94 4.7.7. On Retransmission Timeout Verified . . . . . . . . . 18 95 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 96 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 97 6.1. Normative References . . . . . . . . . . . . . . . . . . 18 98 6.2. Informative References . . . . . . . . . . . . . . . . . 18 99 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 19 100 Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 19 101 B.1. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 19 102 B.2. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 19 103 B.3. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 19 104 B.4. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 20 105 B.5. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 20 106 B.6. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 20 107 B.7. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 20 108 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 110 1. Introduction 112 QUIC is a new multiplexed and secure transport atop UDP. QUIC builds 113 on decades of transport and security experience, and implements 114 mechanisms that make it attractive as a modern general-purpose 115 transport. The QUIC protocol is described in [QUIC-TRANSPORT]. 117 QUIC implements the spirit of known TCP loss recovery mechanisms, 118 described in RFCs, various Internet-drafts, and also those prevalent 119 in the Linux TCP implementation. This document describes QUIC 120 congestion control and loss recovery, and where applicable, 121 attributes the TCP equivalent in RFCs, Internet-drafts, academic 122 papers, and/or TCP implementations. 124 1.1. Notational Conventions 126 The words "MUST", "MUST NOT", "SHOULD", and "MAY" are used in this 127 document. It's not shouting; when they are capitalized, they have 128 the special meaning defined in [RFC2119]. 130 2. Design of the QUIC Transmission Machinery 132 All transmissions in QUIC are sent with a packet-level header, which 133 includes a packet sequence number (referred to below as a packet 134 number). These packet numbers never repeat in the lifetime of a 135 connection, and are monotonically increasing, which makes duplicate 136 detection trivial. This fundamental design decision obviates the 137 need for disambiguating between transmissions and retransmissions and 138 eliminates significant complexity from QUIC's interpretation of TCP 139 loss detection mechanisms. 141 Every packet may contain several frames. We outline the frames that 142 are important to the loss detection and congestion control machinery 143 below. 145 o Retransmittable frames are frames requiring reliable delivery. 146 The most common are STREAM frames, which typically contain 147 application data. 149 o Crypto handshake data is sent on stream 0, and uses the 150 reliability machinery of QUIC underneath. 152 o ACK frames contain acknowledgment information. QUIC uses a SACK- 153 based scheme, where acks express up to 256 ranges. The ACK frame 154 also includes a receive timestamp for each packet newly acked. 156 2.1. Relevant Differences Between QUIC and TCP 158 Readers familiar with TCP's loss detection and congestion control 159 will find algorithms here that parallel well-known TCP ones. 160 Protocol differences between QUIC and TCP however contribute to 161 algorithmic differences. We briefly describe these protocol 162 differences below. 164 2.1.1. Monotonically Increasing Packet Numbers 166 TCP conflates transmission sequence number at the sender with 167 delivery sequence number at the receiver, which results in 168 retransmissions of the same data carrying the same sequence number, 169 and consequently to problems caused by "retransmission ambiguity". 170 QUIC separates the two: QUIC uses a packet sequence number (referred 171 to as the "packet number") for transmissions, and any data that is to 172 be delivered to the receiving application(s) is sent in one or more 173 streams, with stream offsets encoded within STREAM frames inside of 174 packets that determine delivery order. 176 QUIC's packet number is strictly increasing, and directly encodes 177 transmission order. A higher QUIC packet number signifies that the 178 packet was sent later, and a lower QUIC packet number signifies that 179 the packet was sent earlier. When a packet containing frames is 180 deemed lost, QUIC rebundles necessary frames in a new packet with a 181 new packet number, removing ambiguity about which packet is 182 acknowledged when an ACK is received. Consequently, more accurate 183 RTT measurements can be made, spurious retransmissions are trivially 184 detected, and mechanisms such as Fast Retransmit can be applied 185 universally, based only on packet number. 187 This design point significantly simplifies loss detection mechanisms 188 for QUIC. Most TCP mechanisms implicitly attempt to infer 189 transmission ordering based on TCP sequence numbers - a non-trivial 190 task, especially when TCP timestamps are not available. 192 2.1.2. No Reneging 194 QUIC ACKs contain information that is equivalent to TCP SACK, but 195 QUIC does not allow any acked packet to be reneged, greatly 196 simplifying implementations on both sides and reducing memory 197 pressure on the sender. 199 2.1.3. More ACK Ranges 201 QUIC supports up to 256 ACK ranges, opposed to TCP's 3 SACK ranges. 202 In high loss environments, this speeds recovery. 204 2.1.4. Explicit Correction For Delayed Acks 206 QUIC ACKs explicitly encode the delay incurred at the receiver 207 between when a packet is received and when the corresponding ACK is 208 sent. This allows the receiver of the ACK to adjust for receiver 209 delays, specifically the delayed ack timer, when estimating the path 210 RTT. This mechanism also allows a receiver to measure and report the 211 delay from when a packet was received by the OS kernel, which is 212 useful in receivers which may incur delays such as context-switch 213 latency before a userspace QUIC receiver processes a received packet. 215 3. Loss Detection 217 3.1. Overview 219 QUIC uses a combination of ack information and alarms to detect lost 220 packets. An unacknowledged QUIC packet is marked as lost in one of 221 the following ways: 223 o A packet is marked as lost if at least one packet that was sent a 224 threshold number of packets (kReorderingThreshold) after it has 225 been acknowledged. This indicates that the unacknowledged packet 226 is either lost or reordered beyond the specified threshold. This 227 mechanism combines both TCP's FastRetransmit and FACK mechanisms. 229 o If a packet is near the tail, where fewer than 230 kReorderingThreshold packets are sent after it, the sender cannot 231 expect to detect loss based on the previous mechanism. In this 232 case, a sender uses both ack information and an alarm to detect 233 loss. Specifically, when the last sent packet is acknowledged, 234 the sender waits a short period of time to allow for reordering 235 and then marks any unacknowledged packets as lost. This mechanism 236 is based on the Linux implementation of TCP Early Retransmit. 238 o If a packet is sent at the tail, there are no packets sent after 239 it, and the sender cannot use ack information to detect its loss. 241 The sender therefore relies on an alarm to detect such tail 242 losses. This mechanism is based on TCP's Tail Loss Probe. 244 o If all else fails, a Retransmission Timeout (RTO) alarm is always 245 set when any retransmittable packet is outstanding. When this 246 alarm fires, all unacknowledged packets are marked as lost. 248 o Instead of a packet threshold to tolerate reordering, a QUIC 249 sender may use a time threshold. This allows for senders to be 250 tolerant of short periods of significant reordering. In this 251 mechanism, a QUIC sender marks a packet as lost when a larger 252 packet number is acknowledged and a threshold amount of time has 253 passed since the packet was sent. 255 o Handshake packets, which contain STREAM frames for stream 0, are 256 critical to QUIC transport and crypto negotiation, so a separate 257 alarm period is used for them. 259 3.2. Algorithm Details 261 3.2.1. Constants of interest 263 Constants used in loss recovery are based on a combination of RFCs, 264 papers, and common practice. Some may need to be changed or 265 negotiated in order to better suit a variety of environments. 267 kMaxTLPs (default 2): Maximum number of tail loss probes before an 268 RTO fires. 270 kReorderingThreshold (default 3): Maximum reordering in packet 271 number space before FACK style loss detection considers a packet 272 lost. 274 kTimeReorderingFraction (default 1/8): Maximum reordering in time 275 space before time based loss detection considers a packet lost. 276 In fraction of an RTT. 278 kMinTLPTimeout (default 10ms): Minimum time in the future a tail 279 loss probe alarm may be set for. 281 kMinRTOTimeout (default 200ms): Minimum time in the future an RTO 282 alarm may be set for. 284 kDelayedAckTimeout (default 25ms): The length of the peer's delayed 285 ack timer. 287 kDefaultInitialRtt (default 100ms): The default RTT used before an 288 RTT sample is taken. 290 3.2.2. Variables of interest 292 Variables required to implement the congestion control mechanisms are 293 described in this section. 295 loss_detection_alarm: Multi-modal alarm used for loss detection. 297 handshake_count: The number of times the handshake packets have been 298 retransmitted without receiving an ack. 300 tlp_count: The number of times a tail loss probe has been sent 301 without receiving an ack. 303 rto_count: The number of times an rto has been sent without 304 receiving an ack. 306 largest_sent_before_rto: The last packet number sent prior to the 307 first retransmission timeout. 309 time_of_last_sent_packet: The time the most recent packet was sent. 311 largest_sent_packet: The packet number of the most recently sent 312 packet. 314 largest_acked_packet: The largest packet number acknowledged in an 315 ack frame. 317 latest_rtt: The most recent RTT measurement made when receiving an 318 ack for a previously unacked packet. 320 smoothed_rtt: The smoothed RTT of the connection, computed as 321 described in [RFC6298] 323 rttvar: The RTT variance, computed as described in [RFC6298] 325 reordering_threshold: The largest delta between the largest acked 326 retransmittable packet and a packet containing retransmittable 327 frames before it's declared lost. 329 time_reordering_fraction: The reordering window as a fraction of 330 max(smoothed_rtt, latest_rtt). 332 loss_time: The time at which the next packet will be considered lost 333 based on early transmit or exceeding the reordering window in 334 time. 336 sent_packets: An association of packet numbers to information about 337 them, including a number field indicating the packet number, a 338 time field indicating the time a packet was sent, and a bytes 339 field indicating the packet's size. sent_packets is ordered by 340 packet number, and packets remain in sent_packets until 341 acknowledged or lost. 343 3.2.3. Initialization 345 At the beginning of the connection, initialize the loss detection 346 variables as follows: 348 loss_detection_alarm.reset() 349 handshake_count = 0 350 tlp_count = 0 351 rto_count = 0 352 if (UsingTimeLossDetection()) 353 reordering_threshold = infinite 354 time_reordering_fraction = kTimeReorderingFraction 355 else: 356 reordering_threshold = kReorderingThreshold 357 time_reordering_fraction = infinite 358 loss_time = 0 359 smoothed_rtt = 0 360 rttvar = 0 361 largest_sent_before_rto = 0 362 time_of_last_sent_packet = 0 363 largest_sent_packet = 0 365 3.2.4. On Sending a Packet 367 After any packet is sent, be it a new transmission or a rebundled 368 transmission, the following OnPacketSent function is called. The 369 parameters to OnPacketSent are as follows: 371 o packet_number: The packet number of the sent packet. 373 o is_ack_only: A boolean that indicates whether a packet only 374 contains an ACK frame. If true, it is still expected an ack will 375 be received for this packet, but it is not congestion controlled. 377 o sent_bytes: The number of bytes sent in the packet, not including 378 UDP or IP overhead, but including QUIC framing overhead. 380 Pseudocode for OnPacketSent follows: 382 OnPacketSent(packet_number, is_ack_only, sent_bytes): 383 time_of_last_sent_packet = now 384 largest_sent_packet = packet_number 385 sent_packets[packet_number].packet_number = packet_number 386 sent_packets[packet_number].time = now 387 if !is_ack_only: 388 OnPacketSentCC(sent_bytes) 389 sent_packets[packet_number].bytes = sent_bytes 390 SetLossDetectionAlarm() 392 3.2.5. On Ack Receipt 394 When an ack is received, it may acknowledge 0 or more packets. 396 Pseudocode for OnAckReceived and UpdateRtt follow: 398 OnAckReceived(ack): 399 largest_acked_packet = ack.largest_acked 400 // If the largest acked is newly acked, update the RTT. 401 if (sent_packets[ack.largest_acked]): 402 latest_rtt = now - sent_packets[ack.largest_acked].time 403 if (latest_rtt > ack.ack_delay): 404 latest_rtt -= ack.delay 405 UpdateRtt(latest_rtt) 406 // Find all newly acked packets. 407 for acked_packet in DetermineNewlyAckedPackets(): 408 OnPacketAcked(acked_packet.packet_number) 410 DetectLostPackets(ack.largest_acked_packet) 411 SetLossDetectionAlarm() 413 UpdateRtt(latest_rtt): 414 // Based on {{RFC6298}}. 415 if (smoothed_rtt == 0): 416 smoothed_rtt = latest_rtt 417 rttvar = latest_rtt / 2 418 else: 419 rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - latest_rtt) 420 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt 422 3.2.6. On Packet Acknowledgment 424 When a packet is acked for the first time, the following 425 OnPacketAcked function is called. Note that a single ACK frame may 426 newly acknowledge several packets. OnPacketAcked must be called once 427 for each of these newly acked packets. 429 OnPacketAcked takes one parameter, acked_packet, which is the packet 430 number of the newly acked packet, and returns a list of packet 431 numbers that are detected as lost. 433 If this is the first acknowledgement following RTO, check if the 434 smallest newly acknowledged packet is one sent by the RTO, and if so, 435 inform congestion control of a verified RTO, similar to F-RTO 436 [RFC5682] 438 Pseudocode for OnPacketAcked follows: 440 OnPacketAcked(acked_packet_number): 441 OnPacketAckedCC(acked_packet_number) 442 // If a packet sent prior to RTO was acked, then the RTO 443 // was spurious. Otherwise, inform congestion control. 444 if (rto_count > 0 && 445 acked_packet_number > largest_sent_before_rto) 446 OnRetransmissionTimeoutVerified() 447 handshake_count = 0 448 tlp_count = 0 449 rto_count = 0 450 sent_packets.remove(acked_packet_number) 452 3.2.7. Setting the Loss Detection Alarm 454 QUIC loss detection uses a single alarm for all timer-based loss 455 detection. The duration of the alarm is based on the alarm's mode, 456 which is set in the packet and timer events further below. The 457 function SetLossDetectionAlarm defined below shows how the single 458 timer is set based on the alarm mode. 460 3.2.7.1. Handshake Packets 462 The initial flight has no prior RTT sample. A client SHOULD remember 463 the previous RTT it observed when resumption is attempted and use 464 that for an initial RTT value. If no previous RTT is available, the 465 initial RTT defaults to 100ms. 467 Endpoints MUST retransmit handshake frames if not acknowledged within 468 a time limit. This time limit will start as the largest of twice the 469 RTT value and MinTLPTimeout. Each consecutive handshake 470 retransmission doubles the time limit, until an acknowledgement is 471 received. 473 Handshake frames may be cancelled by handshake state transitions. In 474 particular, all non-protected frames SHOULD be no longer be 475 transmitted once packet protection is available. 477 When stateless rejects are in use, the connection is considered 478 immediately closed once a reject is sent, so no timer is set to 479 retransmit the reject. 481 Version negotiation packets are always stateless, and MUST be sent 482 once per handshake packet that uses an unsupported QUIC version, and 483 MAY be sent in response to 0RTT packets. 485 3.2.7.2. Tail Loss Probe and Retransmission Timeout 487 Tail loss probes [LOSS-PROBE] and retransmission timeouts [RFC6298] 488 are an alarm based mechanism to recover from cases when there are 489 outstanding retransmittable packets, but an acknowledgement has not 490 been received in a timely manner. 492 3.2.7.3. Early Retransmit 494 Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It 495 is part of QUIC's time based loss detection, but is always enabled, 496 even when only packet reordering loss detection is enabled. 498 3.2.7.4. Pseudocode 500 Pseudocode for SetLossDetectionAlarm follows: 502 SetLossDetectionAlarm(): 503 if (retransmittable packets are not outstanding): 504 loss_detection_alarm.cancel() 505 return 507 if (handshake packets are outstanding): 508 // Handshake retransmission alarm. 509 if (smoothed_rtt == 0): 510 alarm_duration = 2 * kDefaultInitialRtt 511 else: 512 alarm_duration = 2 * smoothed_rtt 513 alarm_duration = max(alarm_duration, kMinTLPTimeout) 514 alarm_duration = alarm_duration * (2 ^ handshake_count) 515 else if (loss_time != 0): 516 // Early retransmit timer or time loss detection. 517 alarm_duration = loss_time - now 518 else if (tlp_count < kMaxTLPs): 519 // Tail Loss Probe 520 if (retransmittable_packets_outstanding == 1): 521 alarm_duration = 1.5 * smoothed_rtt + kDelayedAckTimeout 522 else: 523 alarm_duration = kMinTLPTimeout 524 alarm_duration = max(alarm_duration, 2 * smoothed_rtt) 525 else: 526 // RTO alarm 527 alarm_duration = smoothed_rtt + 4 * rttvar 528 alarm_duration = max(alarm_duration, kMinRTOTimeout) 529 alarm_duration = alarm_duration * (2 ^ rto_count) 531 loss_detection_alarm.set(now + alarm_duration) 533 3.2.8. On Alarm Firing 535 QUIC uses one loss recovery alarm, which when set, can be in one of 536 several modes. When the alarm fires, the mode determines the action 537 to be performed. 539 Pseudocode for OnLossDetectionAlarm follows: 541 OnLossDetectionAlarm(): 542 if (handshake packets are outstanding): 543 // Handshake retransmission alarm. 544 RetransmitAllHandshakePackets() 545 handshake_count++ 546 else if (loss_time != 0): 547 // Early retransmit or Time Loss Detection 548 DetectLostPackets(largest_acked_packet) 549 else if (tlp_count < kMaxTLPs): 550 // Tail Loss Probe. 551 SendOnePacket() 552 tlp_count++ 553 else: 554 // RTO. 555 if (rto_count == 0) 556 largest_sent_before_rto = largest_sent_packet 557 SendTwoPackets() 558 rto_count++ 560 SetLossDetectionAlarm() 562 3.2.9. Detecting Lost Packets 564 Packets in QUIC are only considered lost once a larger packet number 565 is acknowledged. DetectLostPackets is called every time an ack is 566 received. If the loss detection alarm fires and the loss_time is 567 set, the previous largest acked packet is supplied. 569 3.2.9.1. Handshake Packets 571 The receiver MUST ignore unprotected packets that ack protected 572 packets. The receiver MUST trust protected acks for unprotected 573 packets, however. Aside from this, loss detection for handshake 574 packets when an ack is processed is identical to other packets. 576 3.2.9.2. Pseudocode 578 DetectLostPackets takes one parameter, acked, which is the largest 579 acked packet. 581 Pseudocode for DetectLostPackets follows: 583 DetectLostPackets(largest_acked): 584 loss_time = 0 585 lost_packets = {} 586 delay_until_lost = infinite 587 if (time_reordering_fraction != infinite): 588 delay_until_lost = 589 (1 + time_reordering_fraction) * max(latest_rtt, smoothed_rtt) 590 else if (largest_acked.packet_number == largest_sent_packet): 591 // Early retransmit alarm. 592 delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) 593 foreach (unacked < largest_acked.packet_number): 594 time_since_sent = now() - unacked.time_sent 595 packet_delta = largest_acked.packet_number - unacked.packet_number 596 if (time_since_sent > delay_until_lost): 597 lost_packets.insert(unacked) 598 else if (packet_delta > reordering_threshold) 599 lost_packets.insert(unacked) 600 else if (loss_time == 0 && delay_until_lost != infinite): 601 loss_time = now() + delay_until_lost - time_since_sent 603 // Inform the congestion controller of lost packets and 604 // lets it decide whether to retransmit immediately. 605 if (!lost_packets.empty()) 606 OnPacketsLost(lost_packets) 607 foreach (packet in lost_packets) 608 sent_packets.remove(packet.packet_number) 610 3.3. Discussion 612 The majority of constants were derived from best common practices 613 among widely deployed TCP implementations on the internet. 614 Exceptions follow. 616 A shorter delayed ack time of 25ms was chosen because longer delayed 617 acks can delay loss recovery and for the small number of connections 618 where less than packet per 25ms is delivered, acking every packet is 619 beneficial to congestion control and loss recovery. 621 The default initial RTT of 100ms was chosen because it is slightly 622 higher than both the median and mean min_rtt typically observed on 623 the public internet. 625 4. Congestion Control 627 QUIC's congestion control is based on TCP NewReno[RFC6582] congestion 628 control to determine the congestion window and pacing rate. QUIC 629 congestion control is specified in bytes due to finer control and the 630 ease of appropriate byte counting[RFC3465]. 632 4.1. Slow Start 634 QUIC begins every connection in slow start and exits slow start upon 635 loss. QUIC re-enters slow start after a retransmission timeout. 636 While in slow start, QUIC increases the congestion window by the 637 number of acknowledged bytes when each ack is processed. 639 4.2. Congestion Avoidance 641 Slow start exits to congestion avoidance. Congestion avoidance in 642 NewReno uses an additive increase multiplicative decrease (AIMD) 643 approach that increases the congestion window by one MSS of bytes per 644 congestion window acknowledged. When a loss is detected, NewReno 645 halves the congestion window and sets the slow start threshold to the 646 new congestion window. 648 4.3. Recovery Period 650 Recovery is a period of time beginning with detection of a lost 651 packet. Because QUIC retransmits frames, not packets, it defines the 652 end of recovery as all packets outstanding at the start of recovery 653 being acknowledged or lost. This is slightly different from TCP's 654 definition of recovery ending when the lost packet that started 655 recovery is acknowledged. During recovery, the congestion window is 656 not increased or decreased. As such, multiple lost packets only 657 decrease the congestion window once as long as they're lost before 658 exiting recovery. This causes QUIC to decrease the congestion window 659 multiple times if retransmisions are lost, but limits the reduction 660 to once per round trip. 662 4.4. Tail Loss Probe 664 If recovery sends a tail loss probe, no change is made to the 665 congestion window or pacing rate. Acknowledgement or loss of tail 666 loss probes are treated like any other packet. 668 4.5. Retransmission Timeout 670 When retransmissions are sent due to a retransmission timeout alarm, 671 no change is made to the congestion window or pacing rate until the 672 next acknowledgement arrives. When an ack arrives, if packets prior 673 to the first retransmission timeout are acknowledged, then the 674 congestion window remains the same. If no packets prior to the first 675 retransmission timeout are acknowledged, the retransmission timeout 676 has been validated and the congestion window must be reduced to the 677 minimum congestion window and slow start is begun. 679 4.6. Pacing Rate 681 The pacing rate is a function of the mode, the congestion window, and 682 the smoothed rtt. Specifically, the pacing rate is 2 times the 683 congestion window divided by the smoothed RTT during slow start and 684 1.25 times the congestion window divided by the smoothed RTT during 685 slow start. In order to fairly compete with flows that are not 686 pacing, it is recommended to not pace the first 10 sent packets when 687 exiting quiescence. 689 4.7. Pseudocode 691 4.7.1. Constants of interest 693 Constants used in congestion control are based on a combination of 694 RFCs, papers, and common practice. Some may need to be changed or 695 negotiated in order to better suit a variety of environments. 697 kDefaultMss (default 1460 bytes): The default max packet size used 698 for calculating default and minimum congestion windows. 700 kInitialWindow (default 10 * kDefaultMss): Default limit on the 701 amount of outstanding data in bytes. 703 kMinimumWindow (default 2 * kDefaultMss): Default minimum congestion 704 window. 706 kLossReductionFactor (default 0.5): Reduction in congestion window 707 when a new loss event is detected. 709 4.7.2. Variables of interest 711 Variables required to implement the congestion control mechanisms are 712 described in this section. 714 bytes_in_flight: The sum of the size in bytes of all sent packets 715 that contain at least one retransmittable or PADDING frame, and 716 have not been acked or declared lost. The size does not include 717 IP or UDP overhead. Packets only containing ack frames do not 718 count towards byte_in_flight to ensure congestion control does not 719 impede congestion feedback. 721 congestion_window: Maximum number of bytes in flight that may be 722 sent. 724 end_of_recovery: The largest packet number sent when QUIC detects a 725 loss. When a larger packet is acknowledged, QUIC exits recovery. 727 ssthresh Slow start threshold in bytes. When the congestion window 728 is below ssthresh, the mode is slow start and the window grows by 729 the number of bytes acknowledged. 731 4.7.3. Initialization 733 At the beginning of the connection, initialize the congestion control 734 variables as follows: 736 congestion_window = kInitialWindow 737 bytes_in_flight = 0 738 end_of_recovery = 0 739 ssthresh = infinite 741 4.7.4. On Packet Sent 743 Whenever a packet is sent, and it contains non-ACK frames, the packet 744 increases bytes_in_flight. 746 OnPacketSentCC(bytes_sent): 747 bytes_in_flight += bytes_sent 749 4.7.5. On Packet Acknowledgement 751 Invoked from loss detection's OnPacketAcked and is supplied with 752 acked_packet from sent_packets. 754 OnPacketAckedCC(acked_packet): 755 // Remove from bytes_in_flight. 756 bytes_in_flight -= acked_packet.bytes 757 if (acked_packet.packet_number < end_of_recovery): 758 // Do not increase congestion window in recovery period. 759 return 760 if (congestion_window < ssthresh): 761 // Slow start. 762 congestion_window += acked_packets.bytes 763 else: 764 // Congestion avoidance. 765 congestion_window += 766 kDefaultMss * acked_packets.bytes / congestion_window 768 4.7.6. On Packets Lost 770 Invoked by loss detection from DetectLostPackets when new packets are 771 detected lost. 773 OnPacketsLost(lost_packets): 774 // Remove lost packets from bytes_in_flight. 775 for (lost_packet : lost_packets): 776 bytes_in_flight -= lost_packet.bytes 777 largest_lost_packet = lost_packets.last() 778 // Start a new recovery epoch if the lost packet is larger 779 // than the end of the previous recovery epoch. 780 if (end_of_recovery < largest_lost_packet.packet_number): 781 end_of_recovery = largest_sent_packet 782 congestion_window *= kLossReductionFactor 783 congestion_window = max(congestion_window, kMinimumWindow) 784 ssthresh = congestion_window 786 4.7.7. On Retransmission Timeout Verified 788 QUIC decreases the congestion window to the minimum value once the 789 retransmission timeout has been verified. 791 OnRetransmissionTimeoutVerified() 792 congestion_window = kMinimumWindow 794 5. IANA Considerations 796 This document has no IANA actions. Yet. 798 6. References 800 6.1. Normative References 802 [QUIC-TRANSPORT] 803 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 804 Multiplexed and Secure Transport", draft-ietf-quic- 805 transport (work in progress), September 2017. 807 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 808 Requirement Levels", BCP 14, RFC 2119, 809 DOI 10.17487/RFC2119, March 1997, . 812 6.2. Informative References 814 [LOSS-PROBE] 815 Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 816 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 817 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 818 in progress), February 2013. 820 [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte 821 Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February 822 2003, . 824 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 825 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 826 Spurious Retransmission Timeouts with TCP", RFC 5682, 827 DOI 10.17487/RFC5682, September 2009, . 830 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 831 P. Hurtig, "Early Retransmit for TCP and Stream Control 832 Transmission Protocol (SCTP)", RFC 5827, 833 DOI 10.17487/RFC5827, May 2010, . 836 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 837 "Computing TCP's Retransmission Timer", RFC 6298, 838 DOI 10.17487/RFC6298, June 2011, . 841 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 842 NewReno Modification to TCP's Fast Recovery Algorithm", 843 RFC 6582, DOI 10.17487/RFC6582, April 2012, 844 . 846 Appendix A. Acknowledgments 848 Appendix B. Change Log 850 *RFC Editor's Note:* Please remove this section prior to 851 publication of a final version of this document. 853 B.1. Since draft-ietf-quic-recovery-05 855 o Add more congestion control text (#776) 857 B.2. Since draft-ietf-quic-recovery-04 859 No significant changes. 861 B.3. Since draft-ietf-quic-recovery-03 863 No significant changes. 865 B.4. Since draft-ietf-quic-recovery-02 867 o Integrate F-RTO (#544, #409) 869 o Add congestion control (#545, #395) 871 o Require connection abort if a skipped packet was acknowledged 872 (#415) 874 o Simplify RTO calculations (#142, #417) 876 B.5. Since draft-ietf-quic-recovery-01 878 o Overview added to loss detection 880 o Changes initial default RTT to 100ms 882 o Added time-based loss detection and fixes early retransmit 884 o Clarified loss recovery for handshake packets 886 o Fixed references and made TCP references informative 888 B.6. Since draft-ietf-quic-recovery-00 890 o Improved description of constants and ACK behavior 892 B.7. Since draft-iyengar-quic-loss-recovery-01 894 o Adopted as base for draft-ietf-quic-recovery 896 o Updated authors/editors list 898 o Added table of contents 900 Authors' Addresses 902 Jana Iyengar (editor) 903 Google 905 Email: jri@google.com 907 Ian Swett (editor) 908 Google 910 Email: ianswett@google.com