idnits 2.17.1 draft-ietf-quic-recovery-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The abstract seems to contain references ([2], [3], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (November 14, 2017) is 2352 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 1081 -- Looks like a reference, but probably isn't: '2' on line 1083 -- Looks like a reference, but probably isn't: '3' on line 1085 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-07 ** Downref: Normative reference to an Experimental RFC: RFC 4653 ** Downref: Normative reference to an Experimental RFC: RFC 5827 -- Duplicate reference: draft-dukkipati-tcpm-tcp-loss-probe, mentioned in 'TLP', was also mentioned in 'LOSS-PROBE'. Summary: 4 errors (**), 0 flaws (~~), 3 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC J. Iyengar, Ed. 3 Internet-Draft I. Swett, Ed. 4 Intended status: Standards Track Google 5 Expires: May 18, 2018 November 14, 2017 7 QUIC Loss Detection and Congestion Control 8 draft-ietf-quic-recovery-07 10 Abstract 12 This document describes loss detection and congestion control 13 mechanisms for QUIC. 15 Note to Readers 17 Discussion of this draft takes place on the QUIC working group 18 mailing list (quic@ietf.org), which is archived at 19 https://mailarchive.ietf.org/arch/search/?email_list=quic [1]. 21 Working Group information can be found at https://github.com/quicwg 22 [2]; source code and issues list for this draft can be found at 23 https://github.com/quicwg/base-drafts/labels/recovery [3]. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at https://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on May 18, 2018. 42 Copyright Notice 44 Copyright (c) 2017 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (https://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 61 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 3 62 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 4 63 2.1.1. Monotonically Increasing Packet Numbers . . . . . . . 4 64 2.1.2. No Reneging . . . . . . . . . . . . . . . . . . . . . 5 65 2.1.3. More ACK Ranges . . . . . . . . . . . . . . . . . . . 5 66 2.1.4. Explicit Correction For Delayed Acks . . . . . . . . 5 67 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 5 68 3.1. Computing the RTT estimate . . . . . . . . . . . . . . . 5 69 3.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 5 70 3.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 6 71 3.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 6 72 3.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 7 73 3.3.1. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 7 74 3.3.2. Retransmission Timeout . . . . . . . . . . . . . . . 9 75 3.3.3. Handshake Timeout . . . . . . . . . . . . . . . . . . 10 76 3.4. Algorithm Details . . . . . . . . . . . . . . . . . . . . 10 77 3.4.1. Constants of interest . . . . . . . . . . . . . . . . 10 78 3.4.2. Variables of interest . . . . . . . . . . . . . . . . 11 79 3.4.3. Initialization . . . . . . . . . . . . . . . . . . . 12 80 3.4.4. On Sending a Packet . . . . . . . . . . . . . . . . . 13 81 3.4.5. On Ack Receipt . . . . . . . . . . . . . . . . . . . 13 82 3.4.6. On Packet Acknowledgment . . . . . . . . . . . . . . 14 83 3.4.7. Setting the Loss Detection Alarm . . . . . . . . . . 15 84 3.4.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 17 85 3.4.9. Detecting Lost Packets . . . . . . . . . . . . . . . 17 86 3.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . 18 87 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 19 88 4.1. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 19 89 4.2. Congestion Avoidance . . . . . . . . . . . . . . . . . . 19 90 4.3. Recovery Period . . . . . . . . . . . . . . . . . . . . . 19 91 4.4. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 19 92 4.5. Retransmission Timeout . . . . . . . . . . . . . . . . . 20 93 4.6. Pacing Rate . . . . . . . . . . . . . . . . . . . . . . . 20 94 4.7. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 20 95 4.7.1. Constants of interest . . . . . . . . . . . . . . . . 20 96 4.7.2. Variables of interest . . . . . . . . . . . . . . . . 20 97 4.7.3. Initialization . . . . . . . . . . . . . . . . . . . 21 98 4.7.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 21 99 4.7.5. On Packet Acknowledgement . . . . . . . . . . . . . . 21 100 4.7.6. On Packets Lost . . . . . . . . . . . . . . . . . . . 22 101 4.7.7. On Retransmission Timeout Verified . . . . . . . . . 22 102 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 103 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 104 6.1. Normative References . . . . . . . . . . . . . . . . . . 23 105 6.2. Informative References . . . . . . . . . . . . . . . . . 24 106 6.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 24 107 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 24 108 Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 24 109 B.1. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 24 110 B.2. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 24 111 B.3. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 25 112 B.4. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 25 113 B.5. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 25 114 B.6. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 25 115 B.7. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 25 116 B.8. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 25 117 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 119 1. Introduction 121 QUIC is a new multiplexed and secure transport atop UDP. QUIC builds 122 on decades of transport and security experience, and implements 123 mechanisms that make it attractive as a modern general-purpose 124 transport. The QUIC protocol is described in [QUIC-TRANSPORT]. 126 QUIC implements the spirit of known TCP loss recovery mechanisms, 127 described in RFCs, various Internet-drafts, and also those prevalent 128 in the Linux TCP implementation. This document describes QUIC 129 congestion control and loss recovery, and where applicable, 130 attributes the TCP equivalent in RFCs, Internet-drafts, academic 131 papers, and/or TCP implementations. 133 1.1. Notational Conventions 135 The words "MUST", "MUST NOT", "SHOULD", and "MAY" are used in this 136 document. It's not shouting; when they are capitalized, they have 137 the special meaning defined in [RFC2119]. 139 2. Design of the QUIC Transmission Machinery 141 All transmissions in QUIC are sent with a packet-level header, which 142 includes a packet sequence number (referred to below as a packet 143 number). These packet numbers never repeat in the lifetime of a 144 connection, and are monotonically increasing, which makes duplicate 145 detection trivial. This fundamental design decision obviates the 146 need for disambiguating between transmissions and retransmissions and 147 eliminates significant complexity from QUIC's interpretation of TCP 148 loss detection mechanisms. 150 Every packet may contain several frames. We outline the frames that 151 are important to the loss detection and congestion control machinery 152 below. 154 o Retransmittable frames are frames requiring reliable delivery. 155 The most common are STREAM frames, which typically contain 156 application data. 158 o Crypto handshake data is sent on stream 0, and uses the 159 reliability machinery of QUIC underneath. 161 o ACK frames contain acknowledgment information. QUIC uses a SACK- 162 based scheme, where acks express up to 256 ranges. 164 2.1. Relevant Differences Between QUIC and TCP 166 Readers familiar with TCP's loss detection and congestion control 167 will find algorithms here that parallel well-known TCP ones. 168 Protocol differences between QUIC and TCP however contribute to 169 algorithmic differences. We briefly describe these protocol 170 differences below. 172 2.1.1. Monotonically Increasing Packet Numbers 174 TCP conflates transmission sequence number at the sender with 175 delivery sequence number at the receiver, which results in 176 retransmissions of the same data carrying the same sequence number, 177 and consequently to problems caused by "retransmission ambiguity". 178 QUIC separates the two: QUIC uses a packet sequence number (referred 179 to as the "packet number") for transmissions, and any data that is to 180 be delivered to the receiving application(s) is sent in one or more 181 streams, with stream offsets encoded within STREAM frames inside of 182 packets that determine delivery order. 184 QUIC's packet number is strictly increasing, and directly encodes 185 transmission order. A higher QUIC packet number signifies that the 186 packet was sent later, and a lower QUIC packet number signifies that 187 the packet was sent earlier. When a packet containing frames is 188 deemed lost, QUIC rebundles necessary frames in a new packet with a 189 new packet number, removing ambiguity about which packet is 190 acknowledged when an ACK is received. Consequently, more accurate 191 RTT measurements can be made, spurious retransmissions are trivially 192 detected, and mechanisms such as Fast Retransmit can be applied 193 universally, based only on packet number. 195 This design point significantly simplifies loss detection mechanisms 196 for QUIC. Most TCP mechanisms implicitly attempt to infer 197 transmission ordering based on TCP sequence numbers - a non-trivial 198 task, especially when TCP timestamps are not available. 200 2.1.2. No Reneging 202 QUIC ACKs contain information that is equivalent to TCP SACK, but 203 QUIC does not allow any acked packet to be reneged, greatly 204 simplifying implementations on both sides and reducing memory 205 pressure on the sender. 207 2.1.3. More ACK Ranges 209 QUIC supports up to 256 ACK ranges, opposed to TCP's 3 SACK ranges. 210 In high loss environments, this speeds recovery. 212 2.1.4. Explicit Correction For Delayed Acks 214 QUIC ACKs explicitly encode the delay incurred at the receiver 215 between when a packet is received and when the corresponding ACK is 216 sent. This allows the receiver of the ACK to adjust for receiver 217 delays, specifically the delayed ack timer, when estimating the path 218 RTT. This mechanism also allows a receiver to measure and report the 219 delay from when a packet was received by the OS kernel, which is 220 useful in receivers which may incur delays such as context-switch 221 latency before a userspace QUIC receiver processes a received packet. 223 3. Loss Detection 225 QUIC senders use both ack information and timeouts to detect lost 226 packets, and this section provides a description of these algorithms. 227 Estimating the network round-trip time (RTT) is critical to these 228 algorithms and is described first. 230 3.1. Computing the RTT estimate 232 (To be filled) 234 3.2. Ack-based Detection 236 Ack-based loss detection implements the spirit of TCP's Fast 237 Retransmit [RFC5681], Early Retransmit [RFC5827], FACK, and SACK loss 238 recovery [RFC6675]. This section provides an overview of how these 239 algorithms are implemented in QUIC. 241 (TODO: Define unacknowledged packet, ackable packet, outstanding 242 bytes.) 244 3.2.1. Fast Retransmit 246 An unacknowledged packet is marked as lost when an acknowledgment is 247 received for a packet that was sent a threshold number of packets 248 (kReorderingThreshold) after the unacknowledged packet. Receipt of 249 the ack indicates that a later packet was received, while 250 kReorderingThreshold provides some tolerance for reordering of 251 packets in the network. 253 The RECOMMENDED initial value for kReorderingThreshold is 3. 255 We derive this default from recommendations for TCP loss recovery 256 [RFC5681] [RFC6675]. It is possible for networks to exhibit higher 257 degrees of reordering, causing a sender to detect spurious losses. 258 Detecting spurious losses leads to unnecessary retransmissions and 259 may result in degraded performance due to the actions of the 260 congestion controller upon detecting loss. Implementers MAY use 261 algorithms developed for TCP, such as TCP-NCR [RFC4653], to improve 262 QUIC's reordering resilience, though care should be taken to map TCP 263 specifics to QUIC correctly. Similarly, using time-based loss 264 detection to deal with reordering, such as in PR-TCP, should be more 265 readily usable in QUIC. Making QUIC deal with such networks is 266 important open research, and implementers are encouraged to explore 267 this space. 269 3.2.2. Early Retransmit 271 Unacknowledged packets close to the tail may have fewer than 272 kReorderingThreshold number of ackable packets sent after them. Loss 273 of such packets cannot be detected via Fast Retransmit. To enable 274 ack-based loss detection of such packets, receipt of an 275 acknowledgment for the last outstanding ackable packet triggers the 276 Early Retransmit process, as follows. 278 If there are unacknowledged ackable packets still pending, they ought 279 to be marked as lost. To compensate for the reduced reordering 280 resilience, the sender SHOULD set an alarm for a small period of 281 time. If the unacknowledged ackable packets are not acknowledged 282 during this time, then these packets MUST be marked as lost. 284 An endpoint SHOULD set the alarm such that a packet is marked as lost 285 no earlier than 1.25 * max(SRTT, latest_RTT) since when it was sent. 287 Using max(SRTT, latest_RTT) protects from the two following cases: 289 o the latest RTT sample is lower than the SRTT, perhaps due to 290 reordering where packet whose ack triggered the Early Retransit 291 process encountered a shorter path; 293 o the latest RTT sample is higher than the SRTT, perhaps due to a 294 sustained increase in the actual RTT, but the smoothed SRTT has 295 not yet caught up. 297 The 1.25 multiplier increases reordering resilience. Implementers 298 MAY experiment with using other multipliers, bearing in mind that a 299 lower multiplier reduces reordering resilience and increases spurious 300 retransmissions, and a higher multipler increases loss recovery 301 delay. 303 This mechanism is based on Early Retransmit for TCP [RFC5827]. 304 However, [RFC5827] does not include the alarm described above. Early 305 Retransmit is prone to spurious retransmissions due to its reduced 306 reordering resilence without the alarm. This observation led Linux 307 TCP implementers to implement an alarm for TCP as well, and this 308 document incorporates this advancement. 310 3.3. Timer-based Detection 312 Timer-based loss detection implements the spirit of TCP's Tail Loss 313 Probe and Retransmission Timeout mechanisms. 315 3.3.1. Tail Loss Probe 317 The algorithm described in this section is an adaptation of the Tail 318 Loss Probe algorithm proposed for TCP [TLP]. 320 A packet sent at the tail is particularly vulnerable to slow loss 321 detection, since acks of subsequent packets are needed to trigger 322 ack-based detection. To ameliorate this weakness of tail packets, 323 the sender schedules an alarm when the last ackable packet before 324 quiescence is transmitted. When this alarm fires, a Tail Loss Probe 325 (TLP) packet is sent to evoke an acknowledgement from the receiver. 327 The alarm duration, or Probe Timeout (PTO), is set based on the 328 following conditions: 330 o If there is exactly one unacknowledged packet, PTO SHOULD be 331 scheduled for max(2_SRTT, 1.5_SRTT+kDelayedAckTimeout) 333 o If there are more than one unacknowledged packets, PTO SHOULD be 334 scheduled for max(2*SRTT, 10ms). 336 o If RTO is earlier, schedule a TLP alarm in its place. That is, 337 PTO SHOULD be scheduled for min(RTO, PTO). 339 kDelayedAckTimeout is the expected delayed ACK timer. When there is 340 exactly one unacknowledged packet, the alarm duration includes time 341 for an acknowledgment to be received, and additionally, a 342 kDelayedAckTimeout period to compensate for the delayed 343 acknowledgment timer at the receiver. 345 The RECOMMENDED value for kDelayedAckTimeout is 25ms. 347 (TODO: Add negotiability of delayed ack timeout.) 349 A PTO value of at least 2_SRTT ensures that the ACK is overdue. 350 Using a PTO of exactly 1_SRTT may generate spurious probes, and 351 2*SRTT is simply the next integral value of RTT. 353 (TODO: These values of 2 and 1.5 are a bit arbitrary. Reconsider 354 these.) 356 If the Retransmission Timeout (RTO, Section 3.3.2) period is smaller 357 than the computed PTO, then a PTO is scheduled for the smaller RTO 358 period. 360 To reduce latency, it is RECOMMENDED that the sender set and allow 361 the TLP alarm to fire twice before setting an RTO alarm. In other 362 words, when the TLP alarm fires the first time, a TLP packet is sent, 363 and it is RECOMMENDED that the TLP alarm be scheduled for a second 364 time. When the TLP alarm fires the second time, a second TLP packet 365 is sent, and an RTO alarm SHOULD be scheduled Section 3.3.2. 367 A TLP packet SHOULD carry new data when possible. If new data is 368 unavailable or new data cannot be sent due to flow control, a TLP 369 packet MAY retransmit unacknowledged data to potentially reduce 370 recovery time. Since a TLP alarm is used to send a probe into the 371 network prior to establishing any packet loss, prior unacknowledged 372 packets SHOULD NOT be marked as lost when a TLP alarm fires. 374 A TLP packet MUST NOT be blocked by the sender's congestion 375 controller. The sender MUST however count these bytes as additional 376 bytes in flight, since a TLP adds network load without establishing 377 packet loss. 379 A sender will commonly not know that a packet being sent is a tail 380 packet. Consequently, a sender may have to arm or adjust the TLP 381 alarm on every sent ackable packet. 383 3.3.2. Retransmission Timeout 385 A Retransmission Timeout (RTO) alarm is the final backstop for loss 386 detection. The algorithm used in QUIC is based on the RTO algorithm 387 for TCP [RFC5681] and is additionally resilient to spurious RTO 388 events [RFC5682]. 390 When the last TLP packet is sent, an alarm is scheduled for the RTO 391 period. When this alarm fires, the sender sends two packets, to 392 evoke acknowledgements from the receiver, and restarts the RTO alarm. 394 Similar to TCP [RFC6298], the RTO period is set based on the 395 following conditions: 397 o When the final TLP packet is sent, the RTO period is set to 398 max(SRTT + 4*RTTVAR, minRTO) 400 o When an RTO alarm fires, the RTO period is doubled. 402 The sender typically has incurred a high latency penalty by the time 403 an RTO alarm fires, and this penalty increases exponentially in 404 subsequent consecutive RTO events. Sending a single packet on an RTO 405 event therefore makes the connection very sensitive to single packet 406 loss. Sending two packets instead of one significantly increases 407 resilience to packet drop in both directions, thus reducing the 408 probability of consecutive RTO events. 410 QUIC's RTO algorithm differs from TCP in that the firing of an RTO 411 alarm is not considered a strong enough signal of packet loss. An 412 RTO alarm fires only when there's a prolonged period of network 413 silence, which could be caused by a change in the underlying network 414 RTT. 416 When an acknowledgment is received for a packet sent on an RTO event, 417 any unacknowledged packets with lower packet numbers than those 418 acknowledged MUST be marked as lost. 420 A packet sent when an RTO alarm fires MAY carry new data if available 421 or unacknowledged data to potentially reduce recovery time. Since 422 this packet is sent as a probe into the network prior to establishing 423 any packet loss, prior unacknowledged packets SHOULD NOT be marked as 424 lost. 426 A packet sent on an RTO alarm MUST NOT be blocked by the sender's 427 congestion controller. A sender MUST however count these bytes as 428 additional bytes in flight, since this packet adds network load 429 without establishing packet loss. 431 3.3.3. Handshake Timeout 433 Handshake packets, which contain STREAM frames for stream 0, are 434 critical to QUIC transport and crypto negotiation, so a separate 435 alarm is used for them. 437 The handshake timeout SHOULD be set to twice the initial RTT. 439 There are no prior RTT samples within this connection. However, this 440 may be a resumed connection over the same network, in which case, a 441 client SHOULD use the previous connection's final smoothed RTT value 442 as the resumed connection's initial RTT. 444 If no previous RTT is available, or if the network changes, the 445 initial RTT SHOULD be set to 100ms. 447 When the first handshake packet is sent, the sender SHOULD set an 448 alarm for the handshake timeout period. 450 When the alarm fires, the sender MUST retransmit all unacknowledged 451 handshake frames. The sender SHOULD double the handshake timeout and 452 set an alarm for this period. 454 On each consecutive firing of the handshake alarm, the sender SHOULD 455 double the handshake timeout period. 457 When an acknowledgement is received for a handshake packet, the new 458 RTT is computed and the alarm SHOULD be set for twice the newly 459 computed smoothed RTT. 461 Handshake frames may be cancelled by handshake state transitions. In 462 particular, all non-protected frames SHOULD no longer be transmitted 463 once packet protection is available. 465 (TODO: Work this section some more. Add text on client vs. server, 466 and on stateless retry.) 468 3.4. Algorithm Details 470 3.4.1. Constants of interest 472 Constants used in loss recovery are based on a combination of RFCs, 473 papers, and common practice. Some may need to be changed or 474 negotiated in order to better suit a variety of environments. 476 kMaxTLPs (default 2): Maximum number of tail loss probes before an 477 RTO fires. 479 kReorderingThreshold (default 3): Maximum reordering in packet 480 number space before FACK style loss detection considers a packet 481 lost. 483 kTimeReorderingFraction (default 1/8): Maximum reordering in time 484 space before time based loss detection considers a packet lost. 485 In fraction of an RTT. 487 kMinTLPTimeout (default 10ms): Minimum time in the future a tail 488 loss probe alarm may be set for. 490 kMinRTOTimeout (default 200ms): Minimum time in the future an RTO 491 alarm may be set for. 493 kDelayedAckTimeout (default 25ms): The length of the peer's delayed 494 ack timer. 496 kDefaultInitialRtt (default 100ms): The default RTT used before an 497 RTT sample is taken. 499 3.4.2. Variables of interest 501 Variables required to implement the congestion control mechanisms are 502 described in this section. 504 loss_detection_alarm: Multi-modal alarm used for loss detection. 506 handshake_count: The number of times the handshake packets have been 507 retransmitted without receiving an ack. 509 tlp_count: The number of times a tail loss probe has been sent 510 without receiving an ack. 512 rto_count: The number of times an rto has been sent without 513 receiving an ack. 515 largest_sent_before_rto: The last packet number sent prior to the 516 first retransmission timeout. 518 time_of_last_sent_packet: The time the most recent packet was sent. 520 largest_sent_packet: The packet number of the most recently sent 521 packet. 523 largest_acked_packet: The largest packet number acknowledged in an 524 ack frame. 526 latest_rtt: The most recent RTT measurement made when receiving an 527 ack for a previously unacked packet. 529 smoothed_rtt: The smoothed RTT of the connection, computed as 530 described in [RFC6298] 532 rttvar: The RTT variance, computed as described in [RFC6298] 534 reordering_threshold: The largest delta between the largest acked 535 retransmittable packet and a packet containing retransmittable 536 frames before it's declared lost. 538 time_reordering_fraction: The reordering window as a fraction of 539 max(smoothed_rtt, latest_rtt). 541 loss_time: The time at which the next packet will be considered lost 542 based on early transmit or exceeding the reordering window in 543 time. 545 sent_packets: An association of packet numbers to information about 546 them, including a number field indicating the packet number, a 547 time field indicating the time a packet was sent, and a bytes 548 field indicating the packet's size. sent_packets is ordered by 549 packet number, and packets remain in sent_packets until 550 acknowledged or lost. 552 3.4.3. Initialization 554 At the beginning of the connection, initialize the loss detection 555 variables as follows: 557 loss_detection_alarm.reset() 558 handshake_count = 0 559 tlp_count = 0 560 rto_count = 0 561 if (UsingTimeLossDetection()) 562 reordering_threshold = infinite 563 time_reordering_fraction = kTimeReorderingFraction 564 else: 565 reordering_threshold = kReorderingThreshold 566 time_reordering_fraction = infinite 567 loss_time = 0 568 smoothed_rtt = 0 569 rttvar = 0 570 largest_sent_before_rto = 0 571 time_of_last_sent_packet = 0 572 largest_sent_packet = 0 574 3.4.4. On Sending a Packet 576 After any packet is sent, be it a new transmission or a rebundled 577 transmission, the following OnPacketSent function is called. The 578 parameters to OnPacketSent are as follows: 580 o packet_number: The packet number of the sent packet. 582 o is_ack_only: A boolean that indicates whether a packet only 583 contains an ACK frame. If true, it is still expected an ack will 584 be received for this packet, but it is not congestion controlled. 586 o sent_bytes: The number of bytes sent in the packet, not including 587 UDP or IP overhead, but including QUIC framing overhead. 589 Pseudocode for OnPacketSent follows: 591 OnPacketSent(packet_number, is_ack_only, sent_bytes): 592 time_of_last_sent_packet = now 593 largest_sent_packet = packet_number 594 sent_packets[packet_number].packet_number = packet_number 595 sent_packets[packet_number].time = now 596 if !is_ack_only: 597 OnPacketSentCC(sent_bytes) 598 sent_packets[packet_number].bytes = sent_bytes 599 SetLossDetectionAlarm() 601 3.4.5. On Ack Receipt 603 When an ack is received, it may acknowledge 0 or more packets. 605 Pseudocode for OnAckReceived and UpdateRtt follow: 607 OnAckReceived(ack): 608 largest_acked_packet = ack.largest_acked 609 // If the largest acked is newly acked, update the RTT. 610 if (sent_packets[ack.largest_acked]): 611 latest_rtt = now - sent_packets[ack.largest_acked].time 612 if (latest_rtt > ack.ack_delay): 613 latest_rtt -= ack.delay 614 UpdateRtt(latest_rtt) 615 // Find all newly acked packets. 616 for acked_packet in DetermineNewlyAckedPackets(): 617 OnPacketAcked(acked_packet.packet_number) 619 DetectLostPackets(ack.largest_acked_packet) 620 SetLossDetectionAlarm() 622 UpdateRtt(latest_rtt): 623 // Based on {{RFC6298}}. 624 if (smoothed_rtt == 0): 625 smoothed_rtt = latest_rtt 626 rttvar = latest_rtt / 2 627 else: 628 rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - latest_rtt) 629 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt 631 3.4.6. On Packet Acknowledgment 633 When a packet is acked for the first time, the following 634 OnPacketAcked function is called. Note that a single ACK frame may 635 newly acknowledge several packets. OnPacketAcked must be called once 636 for each of these newly acked packets. 638 OnPacketAcked takes one parameter, acked_packet, which is the packet 639 number of the newly acked packet, and returns a list of packet 640 numbers that are detected as lost. 642 If this is the first acknowledgement following RTO, check if the 643 smallest newly acknowledged packet is one sent by the RTO, and if so, 644 inform congestion control of a verified RTO, similar to F-RTO 645 [RFC5682] 647 Pseudocode for OnPacketAcked follows: 649 OnPacketAcked(acked_packet_number): 650 OnPacketAckedCC(acked_packet_number) 651 // If a packet sent prior to RTO was acked, then the RTO 652 // was spurious. Otherwise, inform congestion control. 653 if (rto_count > 0 && 654 acked_packet_number > largest_sent_before_rto) 655 OnRetransmissionTimeoutVerified() 656 handshake_count = 0 657 tlp_count = 0 658 rto_count = 0 659 sent_packets.remove(acked_packet_number) 661 3.4.7. Setting the Loss Detection Alarm 663 QUIC loss detection uses a single alarm for all timer-based loss 664 detection. The duration of the alarm is based on the alarm's mode, 665 which is set in the packet and timer events further below. The 666 function SetLossDetectionAlarm defined below shows how the single 667 timer is set based on the alarm mode. 669 3.4.7.1. Handshake Packets 671 The initial flight has no prior RTT sample. A client SHOULD remember 672 the previous RTT it observed when resumption is attempted and use 673 that for an initial RTT value. If no previous RTT is available, the 674 initial RTT defaults to 100ms. 676 Endpoints MUST retransmit handshake frames if not acknowledged within 677 a time limit. This time limit will start as the largest of twice the 678 RTT value and MinTLPTimeout. Each consecutive handshake 679 retransmission doubles the time limit, until an acknowledgement is 680 received. 682 Handshake frames may be cancelled by handshake state transitions. In 683 particular, all non-protected frames SHOULD be no longer be 684 transmitted once packet protection is available. 686 When stateless rejects are in use, the connection is considered 687 immediately closed once a reject is sent, so no timer is set to 688 retransmit the reject. 690 Version negotiation packets are always stateless, and MUST be sent 691 once per handshake packet that uses an unsupported QUIC version, and 692 MAY be sent in response to 0RTT packets. 694 3.4.7.2. Tail Loss Probe and Retransmission Timeout 696 Tail loss probes [LOSS-PROBE] and retransmission timeouts [RFC6298] 697 are an alarm based mechanism to recover from cases when there are 698 outstanding retransmittable packets, but an acknowledgement has not 699 been received in a timely manner. 701 3.4.7.3. Early Retransmit 703 Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It 704 is part of QUIC's time based loss detection, but is always enabled, 705 even when only packet reordering loss detection is enabled. 707 3.4.7.4. Pseudocode 709 Pseudocode for SetLossDetectionAlarm follows: 711 SetLossDetectionAlarm(): 712 if (retransmittable packets are not outstanding): 713 loss_detection_alarm.cancel() 714 return 716 if (handshake packets are outstanding): 717 // Handshake retransmission alarm. 718 if (smoothed_rtt == 0): 719 alarm_duration = 2 * kDefaultInitialRtt 720 else: 721 alarm_duration = 2 * smoothed_rtt 722 alarm_duration = max(alarm_duration, kMinTLPTimeout) 723 alarm_duration = alarm_duration * (2 ^ handshake_count) 724 else if (loss_time != 0): 725 // Early retransmit timer or time loss detection. 726 alarm_duration = loss_time - now 727 else if (tlp_count < kMaxTLPs): 728 // Tail Loss Probe 729 if (retransmittable_packets_outstanding == 1): 730 alarm_duration = 1.5 * smoothed_rtt + kDelayedAckTimeout 731 else: 732 alarm_duration = kMinTLPTimeout 733 alarm_duration = max(alarm_duration, 2 * smoothed_rtt) 734 else: 735 // RTO alarm 736 alarm_duration = smoothed_rtt + 4 * rttvar 737 alarm_duration = max(alarm_duration, kMinRTOTimeout) 738 alarm_duration = alarm_duration * (2 ^ rto_count) 740 loss_detection_alarm.set(now + alarm_duration) 742 3.4.8. On Alarm Firing 744 QUIC uses one loss recovery alarm, which when set, can be in one of 745 several modes. When the alarm fires, the mode determines the action 746 to be performed. 748 Pseudocode for OnLossDetectionAlarm follows: 750 OnLossDetectionAlarm(): 751 if (handshake packets are outstanding): 752 // Handshake retransmission alarm. 753 RetransmitAllHandshakePackets() 754 handshake_count++ 755 else if (loss_time != 0): 756 // Early retransmit or Time Loss Detection 757 DetectLostPackets(largest_acked_packet) 758 else if (tlp_count < kMaxTLPs): 759 // Tail Loss Probe. 760 SendOnePacket() 761 tlp_count++ 762 else: 763 // RTO. 764 if (rto_count == 0) 765 largest_sent_before_rto = largest_sent_packet 766 SendTwoPackets() 767 rto_count++ 769 SetLossDetectionAlarm() 771 3.4.9. Detecting Lost Packets 773 Packets in QUIC are only considered lost once a larger packet number 774 is acknowledged. DetectLostPackets is called every time an ack is 775 received. If the loss detection alarm fires and the loss_time is 776 set, the previous largest acked packet is supplied. 778 3.4.9.1. Handshake Packets 780 The receiver MUST close the connection with an error of type 781 OPTIMISTIC_ACK when receiving an unprotected packet that acks 782 protected packets. The receiver MUST trust protected acks for 783 unprotected packets, however. Aside from this, loss detection for 784 handshake packets when an ack is processed is identical to other 785 packets. 787 3.4.9.2. Pseudocode 789 DetectLostPackets takes one parameter, acked, which is the largest 790 acked packet. 792 Pseudocode for DetectLostPackets follows: 794 DetectLostPackets(largest_acked): 795 loss_time = 0 796 lost_packets = {} 797 delay_until_lost = infinite 798 if (time_reordering_fraction != infinite): 799 delay_until_lost = 800 (1 + time_reordering_fraction) * max(latest_rtt, smoothed_rtt) 801 else if (largest_acked.packet_number == largest_sent_packet): 802 // Early retransmit alarm. 803 delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) 804 foreach (unacked < largest_acked.packet_number): 805 time_since_sent = now() - unacked.time_sent 806 delta = largest_acked.packet_number - unacked.packet_number 807 if (time_since_sent > delay_until_lost): 808 lost_packets.insert(unacked) 809 else if (delta > reordering_threshold) 810 lost_packets.insert(unacked) 811 else if (loss_time == 0 && delay_until_lost != infinite): 812 loss_time = now() + delay_until_lost - time_since_sent 814 // Inform the congestion controller of lost packets and 815 // lets it decide whether to retransmit immediately. 816 if (!lost_packets.empty()) 817 OnPacketsLost(lost_packets) 818 foreach (packet in lost_packets) 819 sent_packets.remove(packet.packet_number) 821 3.5. Discussion 823 The majority of constants were derived from best common practices 824 among widely deployed TCP implementations on the internet. 825 Exceptions follow. 827 A shorter delayed ack time of 25ms was chosen because longer delayed 828 acks can delay loss recovery and for the small number of connections 829 where less than packet per 25ms is delivered, acking every packet is 830 beneficial to congestion control and loss recovery. 832 The default initial RTT of 100ms was chosen because it is slightly 833 higher than both the median and mean min_rtt typically observed on 834 the public internet. 836 4. Congestion Control 838 QUIC's congestion control is based on TCP NewReno[RFC6582] congestion 839 control to determine the congestion window and pacing rate. QUIC 840 congestion control is specified in bytes due to finer control and the 841 ease of appropriate byte counting[RFC3465]. 843 4.1. Slow Start 845 QUIC begins every connection in slow start and exits slow start upon 846 loss. QUIC re-enters slow start after a retransmission timeout. 847 While in slow start, QUIC increases the congestion window by the 848 number of acknowledged bytes when each ack is processed. 850 4.2. Congestion Avoidance 852 Slow start exits to congestion avoidance. Congestion avoidance in 853 NewReno uses an additive increase multiplicative decrease (AIMD) 854 approach that increases the congestion window by one MSS of bytes per 855 congestion window acknowledged. When a loss is detected, NewReno 856 halves the congestion window and sets the slow start threshold to the 857 new congestion window. 859 4.3. Recovery Period 861 Recovery is a period of time beginning with detection of a lost 862 packet. Because QUIC retransmits stream data and control frames, not 863 packets, it defines the end of recovery as a packet sent after the 864 start of recovery being acknowledged. This is slightly different 865 from TCP's definition of recovery ending when the lost packet that 866 started recovery is acknowledged. 868 During recovery, the congestion window is not increased or decreased. 869 As such, multiple lost packets only decrease the congestion window 870 once as long as they're lost before exiting recovery. This causes 871 QUIC to decrease the congestion window multiple times if 872 retransmisions are lost, but limits the reduction to once per round 873 trip. 875 4.4. Tail Loss Probe 877 If recovery sends a tail loss probe, no change is made to the 878 congestion window or pacing rate. Acknowledgement or loss of tail 879 loss probes are treated like any other packet. 881 4.5. Retransmission Timeout 883 When retransmissions are sent due to a retransmission timeout alarm, 884 no change is made to the congestion window or pacing rate until the 885 next acknowledgement arrives. When an ack arrives, if packets prior 886 to the first retransmission timeout are acknowledged, then the 887 congestion window remains the same. If no packets prior to the first 888 retransmission timeout are acknowledged, the retransmission timeout 889 has been validated and the congestion window must be reduced to the 890 minimum congestion window and slow start is begun. 892 4.6. Pacing Rate 894 The pacing rate is a function of the mode, the congestion window, and 895 the smoothed rtt. Specifically, the pacing rate is 2 times the 896 congestion window divided by the smoothed RTT during slow start and 897 1.25 times the congestion window divided by the smoothed RTT during 898 congestion avoidance. In order to fairly compete with flows that are 899 not pacing, it is recommended to not pace the first 10 sent packets 900 when exiting quiescence. 902 4.7. Pseudocode 904 4.7.1. Constants of interest 906 Constants used in congestion control are based on a combination of 907 RFCs, papers, and common practice. Some may need to be changed or 908 negotiated in order to better suit a variety of environments. 910 kDefaultMss (default 1460 bytes): The default max packet size used 911 for calculating default and minimum congestion windows. 913 kInitialWindow (default 10 * kDefaultMss): Default limit on the 914 amount of outstanding data in bytes. 916 kMinimumWindow (default 2 * kDefaultMss): Default minimum congestion 917 window. 919 kLossReductionFactor (default 0.5): Reduction in congestion window 920 when a new loss event is detected. 922 4.7.2. Variables of interest 924 Variables required to implement the congestion control mechanisms are 925 described in this section. 927 bytes_in_flight: The sum of the size in bytes of all sent packets 928 that contain at least one retransmittable or PADDING frame, and 929 have not been acked or declared lost. The size does not include 930 IP or UDP overhead. Packets only containing ack frames do not 931 count towards byte_in_flight to ensure congestion control does not 932 impede congestion feedback. 934 congestion_window: Maximum number of bytes in flight that may be 935 sent. 937 end_of_recovery: The largest packet number sent when QUIC detects a 938 loss. When a larger packet is acknowledged, QUIC exits recovery. 940 ssthresh Slow start threshold in bytes. When the congestion window 941 is below ssthresh, the mode is slow start and the window grows by 942 the number of bytes acknowledged. 944 4.7.3. Initialization 946 At the beginning of the connection, initialize the congestion control 947 variables as follows: 949 congestion_window = kInitialWindow 950 bytes_in_flight = 0 951 end_of_recovery = 0 952 ssthresh = infinite 954 4.7.4. On Packet Sent 956 Whenever a packet is sent, and it contains non-ACK frames, the packet 957 increases bytes_in_flight. 959 OnPacketSentCC(bytes_sent): 960 bytes_in_flight += bytes_sent 962 4.7.5. On Packet Acknowledgement 964 Invoked from loss detection's OnPacketAcked and is supplied with 965 acked_packet from sent_packets. 967 OnPacketAckedCC(acked_packet): 968 // Remove from bytes_in_flight. 969 bytes_in_flight -= acked_packet.bytes 970 if (acked_packet.packet_number < end_of_recovery): 971 // Do not increase congestion window in recovery period. 972 return 973 if (congestion_window < ssthresh): 974 // Slow start. 975 congestion_window += acked_packets.bytes 976 else: 977 // Congestion avoidance. 978 congestion_window += 979 kDefaultMss * acked_packets.bytes / congestion_window 981 4.7.6. On Packets Lost 983 Invoked by loss detection from DetectLostPackets when new packets are 984 detected lost. 986 OnPacketsLost(lost_packets): 987 // Remove lost packets from bytes_in_flight. 988 for (lost_packet : lost_packets): 989 bytes_in_flight -= lost_packet.bytes 990 largest_lost_packet = lost_packets.last() 991 // Start a new recovery epoch if the lost packet is larger 992 // than the end of the previous recovery epoch. 993 if (end_of_recovery < largest_lost_packet.packet_number): 994 end_of_recovery = largest_sent_packet 995 congestion_window *= kLossReductionFactor 996 congestion_window = max(congestion_window, kMinimumWindow) 997 ssthresh = congestion_window 999 4.7.7. On Retransmission Timeout Verified 1001 QUIC decreases the congestion window to the minimum value once the 1002 retransmission timeout has been verified. 1004 OnRetransmissionTimeoutVerified() 1005 congestion_window = kMinimumWindow 1007 5. IANA Considerations 1009 This document has no IANA actions. Yet. 1011 6. References 1013 6.1. Normative References 1015 [QUIC-TRANSPORT] 1016 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 1017 Multiplexed and Secure Transport", draft-ietf-quic- 1018 transport-07 (work in progress), November 2017. 1020 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1021 Requirement Levels", BCP 14, RFC 2119, 1022 DOI 10.17487/RFC2119, March 1997, 1023 . 1025 [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, 1026 "Improving the Robustness of TCP to Non-Congestion 1027 Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, 1028 . 1030 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1031 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1032 . 1034 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 1035 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 1036 Spurious Retransmission Timeouts with TCP", RFC 5682, 1037 DOI 10.17487/RFC5682, September 2009, 1038 . 1040 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 1041 P. Hurtig, "Early Retransmit for TCP and Stream Control 1042 Transmission Protocol (SCTP)", RFC 5827, 1043 DOI 10.17487/RFC5827, May 2010, 1044 . 1046 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1047 "Computing TCP's Retransmission Timer", RFC 6298, 1048 DOI 10.17487/RFC6298, June 2011, 1049 . 1051 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 1052 and Y. Nishida, "A Conservative Loss Recovery Algorithm 1053 Based on Selective Acknowledgment (SACK) for TCP", 1054 RFC 6675, DOI 10.17487/RFC6675, August 2012, 1055 . 1057 6.2. Informative References 1059 [LOSS-PROBE] 1060 Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 1061 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 1062 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 1063 in progress), February 2013. 1065 [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte 1066 Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February 1067 2003, . 1069 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 1070 NewReno Modification to TCP's Fast Recovery Algorithm", 1071 RFC 6582, DOI 10.17487/RFC6582, April 2012, 1072 . 1074 [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 1075 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 1076 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 1077 in progress), February 2013. 1079 6.3. URIs 1081 [1] https://mailarchive.ietf.org/arch/search/?email_list=quic 1083 [2] https://github.com/quicwg 1085 [3] https://github.com/quicwg/base-drafts/labels/recovery 1087 Appendix A. Acknowledgments 1089 Appendix B. Change Log 1091 *RFC Editor's Note:* Please remove this section prior to 1092 publication of a final version of this document. 1094 B.1. Since draft-ietf-quic-recovery-06 1096 Nothing yet. 1098 B.2. Since draft-ietf-quic-recovery-05 1100 o Add more congestion control text (#776) 1102 B.3. Since draft-ietf-quic-recovery-04 1104 No significant changes. 1106 B.4. Since draft-ietf-quic-recovery-03 1108 No significant changes. 1110 B.5. Since draft-ietf-quic-recovery-02 1112 o Integrate F-RTO (#544, #409) 1114 o Add congestion control (#545, #395) 1116 o Require connection abort if a skipped packet was acknowledged 1117 (#415) 1119 o Simplify RTO calculations (#142, #417) 1121 B.6. Since draft-ietf-quic-recovery-01 1123 o Overview added to loss detection 1125 o Changes initial default RTT to 100ms 1127 o Added time-based loss detection and fixes early retransmit 1129 o Clarified loss recovery for handshake packets 1131 o Fixed references and made TCP references informative 1133 B.7. Since draft-ietf-quic-recovery-00 1135 o Improved description of constants and ACK behavior 1137 B.8. Since draft-iyengar-quic-loss-recovery-01 1139 o Adopted as base for draft-ietf-quic-recovery 1141 o Updated authors/editors list 1143 o Added table of contents 1145 Authors' Addresses 1146 Jana Iyengar (editor) 1147 Google 1149 Email: jri@google.com 1151 Ian Swett (editor) 1152 Google 1154 Email: ianswett@google.com