Internet-Draft Congestion Control Convergence December 2023
Kuhn, et al. Expires 10 June 2024 [Page]
Workgroup:
Internet Engineering Task Force
Internet-Draft:
draft-ietf-tsvwg-careful-resume-05
Published:
Intended Status:
Standards Track
Expires:
Authors:
N. Kuhn
Thales Alenia Space
E. Stephan
Orange
G. Fairhurst
University of Aberdeen
R. Secchi
University of Aberdeen
C. Huitema
Private Octopus Inc.

Convergence of Congestion Control from Retained State

Abstract

This document specifies a cautious method for IETF transports that enables fast startup of congestion control for a wide range of connections. It reuses a set of computed congestion control parameters that are based on previously observed path characteristics between the same pair of transport endpoints. These parameters are stored, allowing them to be later used to modify the congestion control behavior of a subsequent connection.

It describes assumptions and defines requirements for how a sender utilizes these parameters to provide opportunities for a connection to more rapidly get up to speed and rapidly utilize available capacity. It discusses how use of the method impacts the capacity at a shared network bottleneck and the safe response that is needed after any indication that the new rate is inappropriate.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 10 June 2024.

Table of Contents

1. Introduction

All Internet transports are required to either use a Congestion Control (CC) method, or to constrain their rate of transmission [RFC8085]. In 2010, a survey of alternative CC methods [RFC5783], noted that there are challenges when a CC method operates across an Internet path with a high and/or varying Bandwidth-Delay Product (BDP). This mechanism targets a solution for these challenges.

A CC method typically takes time to ramp-up the sending rate, called the "slow-start phase", informally known as the time to "Get up to speed". This slow-start phase defines a time in which a sender intentionally uses less capacity than might be available, with the intention to avoid or limit overshooting the available capacity for the path. The slow-start design can increase queuing (latency/jitter) and/or congestion packet loss for the flow. Any overshoot can have a detrimental effect on other flows sharing a common bottleneck. A sender can use a method to observe the rated of acknowledged data, and seek to avoid overshooting the bottleneck capacity (e.g., Hystart++ [RFC9406]). In the extreme case, an overshoot can result in persistent congestion with unwanted starvation of other flows [RFC8867] (i.e., preventing other flows from successfully sharing the capacity at a common bottleneck).

This document proposes a CC method that is expected to reduce the time to complete a transfer when the transfer sends significantly more data than allowed by the Initial congestion Window (IW), and where the BDP of the path is also significantly more than the IW. It introduces an alternative method to select initial CC parameters, that seek to more rapidly and safely grow the sending rate controlled by then congestion window (CWND). CC methods that are rate-based can make similar adjustments to their target sending rate.

This method is based on temporal sharing (sometimes known as caching) of a saved set of CC parameters that relate to previous observations of the same path. The parameters include: the saved_cwnd for the path and the minimum Round Trip Time (RTT). These parameters are stored and used to modify the CC behavior of a subsequent connection between the same endpoints.

When used with the QUIC transport, this provides transport services that resemble those currently available in TCP, using methods such as TCP Control Block (TCB) [RFC9040] caching.

1.1. Use of saved CC parameters by a Sender

CC parameters are used by Careful Resume for three functions:

  1. Information about the utilised path capacity (saved_cwnd) to determine an appropriate set of CC parameters for re-using the path.

  2. Information to characterize the saved path to confirm whether the current path is consistent with a saved path.

  3. Information to check the validity of the saved CC parameters, including the time for which the parameters remain valid.

"Generally, implementations are advised to be cautious when using saved CC parameters on a new path", as stated in [RFC9000]. While this statement has been proposed in the context of QUIC standardization, this advice is appropriate for any IETF transport protocol. Care is therefore needed to assure safe use and to be robust to changes in traffic patterns, network routing, and link/node conditions. There are cases where using the saved parameters of a previous connection is not appropriate (e.g., Section 3.2).

1.2. Receiver Preference

Whilst a sender could take optimization decisions without considering the receiver's preference, there are cases where a receiver could have information that is not available at the sender, or might benefit from understanding that Careful Resume might be used. In these cases, a receiver could explicitly ask to enable or inhibit tuning of the CC when an application initiates a new session or resume an existing one. A receiver could also tune policies for using the connection (e.g., managing the receiver window or flow credit).

Examples where a receiver might request not to use Careful Resume include:

  1. a receiver that can predict the pattern of traffic (e.g., insight into the volume of data to be sent, the expected length of a session, or the required maximum transfer rate);

  2. a receiver with a local indication that a path/local interface has changed since the CC parameters were stored;

  3. knowledge of the current hardware limitations at a receiver;

  4. a receiver that can predict additional capacity will be needed for other concurrent or later flows (i.e., prefers to activate Careful Resume for a different connection).

A related document proposes an extension for QUIC that allows sender-generated CC parameters to be stored at the receiver [I-D.kuhn-quic-bdpframe-extension]. Transferring the information to a receiver releases the need for a sender to retain transport state for each receiver, and allows a receiver to express a preference for whether to use the method.

1.3. Examples of Scenarios of Interest

This section provides a set of examples where Careful Resume is expected to improve performance.

Either endpoint can assume the role of a sender or a receiver. Careful Resume also supports a bidirectional data transfer, where both endpoints simultaneously send data (e.g., remote execution of an application, or a bidirectional video conference call).

In one example, an application uses a series of connections over a path (i.e., resumes a connection to the same endpoint). Without a new method, each connection would need to individually discover appropriate CC parameters, whereas Careful Resume allows the flow to use a rate that is based on the previously observed CC parameters.

In another example, an application connects after a disruption had temporarily reduced the path capacity (e.g., after a link propagation impairment, or where a user on a train journey travels through different areas of connectivity). When the endpoint returns to use a path with the original characteristics, using a rate that is based on the previously observed CC parameters.

There is particular benefit for any path with an RTT that is much larger than typical Internet paths. In a specific example, an application connected via a satellite access network [IJSCN] could require 9 seconds to complete a 5.3 MB transfer using standard CC, whereas a sender using Careful Resume could be reduce this transfer time to 4 seconds. The time to complete a 1 MB transfer could similarly be reduced by 62 % [MAPRG111]. This benefit is also expected for other sizes of transfer and for different path characteristics when a path has a large BDP.

{XXX-Editor note: A future revision would helpfully provide further Path Examples here.}

2. Language, Notation and Terms

This subsection provides a brief summary of key terms and the requirements language.

2.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2.2. Notation and Terms

The document uses language drawn from a range of IETF RFCs. It defines current, and saved values for a set of CC parameters:

  • CC parameters: A set of saved congestion control parameters from a previously observed connection (see Section 1.1.

  • Careful Resume (CR): The method specified in this document to select initial CC parameters, that seeka to more rapidly and safely increase the initrial sending rate.

  • current_endpoint_token: The Endpoint Token of the current receiver;

  • current_rtt: A sample measurement of the current RTT;

  • endpoint_token: An Endpoint Token identifyingh a path to a receiver;

  • jump_cwnd: The resumed CWND, used in the Unvalidated Phase.

  • lifetime: The time for which the saved CC parameters can be safely re-used.

  • max_jump : The maximum configured jump_cwnd;

  • pipe: The estimated capacity avalaible to a connection at the time of congestion;

  • saved_cwnd: A value of cwnd derived from observation of a previous connection, which reflects capacity that was utilised by the observed connection;

  • saved_endpoint_token: The Endpoint Token of a previous connection to a receiver;

  • saved_rtt: The preserved minimum RTT, e.g., corresponding to the minimum of a set RTT of measurements over the last 5 minutes of a connection.

The Endpoint Token is described in Appendix A.

3. The Phases of CC using Resume

This section defines a series of phases that the CC algorithm moves through as a connection uses Careful Resume, as shown in Figure 1.


Connect -> Reconnaissance --------------------> Normal
             |                                   ^
             v                                   |
           Unvalidated --> Validating -----------+
             |               |                   |
             |               |                   |
             +---------------+--> Safe Retreat --+

Figure 1: Phases when a connection uses the Careful Resume. The Observe Phase is later performed by an established connection.

3.1. Observe Phase

During a previous established connection, the CC parameters for the specific path to an endpoint are saved. This characterizes the path and determines the saved_cwnd. The saved_cwnd is a measure of the currently utilised capacity for the connection, measured as the number of bytes sent over a RTT. This could be computed by measuring the volume of data acknowledged in one RTT. The CC parameters also include the minimum RTT (saved_rtt) and the receiver Endpoint Token (saved_endpoint_token).

An implementation can store the CC parameters at the server (or could exchange this information with a receiver [I-D.kuhn-quic-bdpframe-extension]).

  • Observe Phase: The sender updates the stored CC parameters and/or sends the updated CC parameter information for the saved_cwnd after each observation.

  • Observe Phase (small CWND): If the measured CWND is less than four times the Initial Window (IW) (i.e. CWND less than IW*4), a sender SHOULD NOT store and/or send CC parameters.

  • Observe Phase (sending CC Parameters): When sending the CC parameters to a receiver, these ought to be updated if there are significant changes in the saved CC parameters; The frequency of update SHOULD be less than one update for several RTTs of time.

Implementation notes are provided in Section 4.1.

3.2. Reconnaissance Phase

When a sender resumes transmission, it enters the Reconnaissance Phase. In this phase, the sender transmits initial data, limited by the IW, and monitors its reception using normal CC (i.e., the CC is unchanged).

The phase seeks to determine if the path is consistent with a set of previously observed CC parameters, allowing it to either use the CR method or to revert to the Normal Phase. During this phase a sender records the current minimum RTT for the connection.

There are a set of conditions that need to be confirmed before the sender is permitted to enter the Unvalidated Phase:

  • Reconnaissance Phase (Endpoint change): If the current remote endpoint is not the same as a saved endpoint, the sender MUST enter the Normal Phase. If the Endpoint Token differs (i.e., the saved_endpoint_token is different from the current_endpoint_token), it is assumed to represent a different network path. The sender also enters the Normal Phase when there are no corresponding saved CC parameters.

  • Reconnaissance Phase (Lifetime of saved CC parameters): The CC parameters are temporal. If the lifetime of the observed CC parameters is exceeded Section 4.3.1, the CC parameters are no longer used and sender enters the Normal Phase.

  • Reconnaissance Phase (Confirming RTT): Since the CC information is directly impacted by the RTT, a significant change in the minimum RTT is a strong indication that the previously observed CC parameters are not valid for the current path. An RTT measurement is confirmed when current_rtt is greater than (saved_rtt / 2) and the current_rtt is less than or equal to (saved_rtt x 10).

  • Reconnaissance Phase (Avoiding using Carfeul Resume): A receiver can use a method (e.g., [I-D.kuhn-quic-bdpframe-extension])) to request that the sender enters the Normal Phase.

  • {XXX-Editor note: Reconnaissance Phase (Is there a need for a minimum required number of RTT samples to confirm a path ???? }

  • Reconnaissance Phase (Detected congestion): If the sender detects congestion (e.g., packet loss or ECN-CE marking), the sender does not use the Careful Resume method and MUST enter the Normal Phase to respond to the detected congestion.

  • Reconnaissance Phase (Using saved_cwnd): Only one connection can use a specific set of saved CC parameters. If another connection has already started to use the saved_cwnd, the sender MUST enter the Normal Phase.

  • Reconnaissance Phase (Data-limited sender): If the sender is data limited [RFC7661], it might send insufficient data to be able to validate transmission at the higher rate. Careful Resume is allowed to remain in the Reconnaissance Phase and to not transition to the Unvalidated Phase until the sender has more data ready to send in the transmission buffer than is permitted by the current CWND. In some implementations, the decision to enter the Unvalidated Phase could require coordination with the management of buffers in the interface to the higher layers.

When a sender confirms the path and it receives an acknowledgement for the initial data without reported congestion, it MAY then enter the Unvalidated Phase. This transition occurs when a sender has more data than permitted by the current CWND.

When a path is not confirmed, Careful Resume is not used and the sender enters the Normal Phase.

Implementation requirements are provided in Section 4.2.

3.3. Unvalidated Phase

The Unvalidated Phase is designed to enable the CWND to more rapidly get up to speed.

This phase paces transmission using an increased CWND (jump_cwnd) that is calculated based on the saved CC parameters and current_RTT.

  • Unvalidated Phase (jump_cwnd): To avoid starving other flows that could have either started or increased their used capacity after the Observation Phase, the jump_cwnd MUST be no more than half of the saved_cwnd. Hence, jump_cwnd is less than or equal to the (saved_cwnd/2).

  • Unvalidated Phase (Pacing): Transmission using an unvalidated CWND MUST use pacing.

  • Unvalidated Phase (Confirming the path): If a sender determines that the previous CC parameters are not valid (due to a detected change in the path) (e.g., the RTT has changed), Careful Resume enters the Safe Retreat Phase. (The sender cannot receive feedback for the jump_cwnd, because less than an RTT has passed before the Unvalidated Phase was entered. Therefore, any detected congestion must have resulted from packets sent before the Unvalidated Phase.)

  • The sender enters the Validating Phase when an acknowledgement is received for the first packet number (or higher) that was sent in the Unvalidated Phase.

Implementation notes are provided in Section 4.3.

3.4. Validating Phase

The Validating Phase checks that packets sent in the Unvalidated Phase were received without inducing congestion. The CWND remains unvalidated and the sender typically remains in this phase for one RTT. (Note: When the full jump_cwnd is not fully utilised, it results in a smaller capacity being validated.)

  • Validating Phase (Limiting CWND): On entry to the Validating Phase, the CWND is set to the flight size.

  • Validating Phase (Updating CWND): The CWND is updated using the normal rules for the current congestion controller. The default rule is Reno.

  • Validating Phase (Validating capacity): If a sender determines that congestion was experienced (e.g., packet loss or ECN-CE marking), Careful Resume enters the Safe Retreat Phase.

  • The sender enters the Normal Phase when an acknowledgement is received for the last packet number (or higher) that was sent in the Unvalidated Phase.

3.5. Safe Retreat Phase

This phase is entered when capacity was not validated. It starts when the first loss/ECN-CE marking is detected. (This trigger is the same as used by a QUIC sender to transition from Slow Start to Recovery [RFC9002] .)

{XXX-Editor note: This section to be updated in next rev XXX}

  • Safe Retreat Phase (Saved information): The set of saved CC parameters for the path are deleted, to prevent these from being used again by other flows.

  • Safe Retreat Phase (Re-initializing CC): On entry, the CWND MUST be reduced to no more than the CWND on entry to the Unvalidated Phase. (This can be set to IW). This avoids persistent starvation by allowing capacity for other flows to regain their share of the total capacity.

  • Safe Retreat Phase (QUIC recovery): When the CWND is reduced, a QUIC sender can immediately send a single packet prior to the reduction [RFC9002]. (This speeds up loss recovery if the data in the lost packet is retransmitted and is similar to TCP as described in Section 5 of [RFC6675].)

  • Safe Retreat Phase (Increasing CWND): The CWND MAY be increased for each acknowledgment that acknowledges a previously unacknowledged packet that was sent in the Unvalidated Phase, since this indicates a packet has been successfully sent across the path.

  • The sender enters Normal Phase when the last packet (or later) sent during the Unvalidated Phase has been acknowledged.

Implementation requirements are provided in Section 4.5.

3.5.1. Loss Recovery after entering Safe Retreat

Unacknowledged packets that were sent in the Unvalidated Phase can be lost when there is congestion. Loss recovery commences using the reduced CWND that was set on entry to the Safe Retreat Phase.

  • NOTE: A TCP or SCTP sender is required to retransmit all lost data. For QUIC and DCCP, the need for loss recovery depends on the sender policy for retransmission.

  • NOTE: During loss recovery, a receiver can cumulatively acknowledge data that was previously sent in the Unvalidated Phase in addition to acknowledging successful retransmission of data. [RFC3465] describes how to appropriately account for such acknowledgments.

  • NOTE: On entry to the Safe Retreat Phase, the CWND can be significantly reduced, when there was multiple loss, recovery of all lost data could require multiple RTTs to complete.

The sender leaves the Safe Retreat Phase when an acknowledgement is received for the last packet number (or higher) sent in the Unvalidated Phase. If the last packet number is not cumulatively acknowledged, then additional packets might need to be retransmitted.

CC methods using a slowstart threshold need to update this from the CWND (i.e., ssthresh is set to CWND).

The Normal Phase is then entered.

3.6. Normal Phase

In the Normal Phase, the sender transitions to using the normal CC method (e.g., in congestion avoidance).

  • Normal Phase (Updating CC): The sender MUST reset the CWND on entry to the Normal Phase to reflect the volume of acknowledged data that was received during the Unvalidated Phase. (When the sender has used the entire jump_cwnd and this was acknowledged in full, no adjustment is needed.)

Implementation requirements are provided in Section 4.6.

3.7. RTO Expiry while using Careful Resume

A sender that experiences a Retransmission Time Out (RTO) expiry ceases to use Careful Resume. The sender continues enters the Normal Phase.

  • NOTE: As in loss recovery, data sent in the Unvalidated Phase could be later acknowledged after an RTO event (see Section 3.5.1).

4. Congestion Control Guidelines and Requirements

This section provides guidance for implementation and use.

4.1. Determining the Current Path Capacity in the Observe Phase

There are various approaches to measuring the capacity that used by a connection. Congestion controllers, such as CUBIC or Reno, can estimate the capacity by utilizing the CWND or flight_size. A different approach could estimate the same parameters for a rate-based congestion controller, such as BBR [I-D.cardwell-iccrg-bbr-congestion-control], or by observing the rate at which data is acknowledged by the remote endpoint.

Implementations are expected to include a lifetime parameter in the CC parameters that can be used to remove old CC parameters when no longer needed, or the path inormation is out of date.

  • Observe Phase: There are cases where the current CWND does not reflect the path capacity. At the end of slow start, the CWND can be significantly larger than needed to fully utilize the path (i.e., a CWND overshoot). It is inappropriate to use an overshoot in the CWND as a basis for estimating the capacity. In most cases, the CWND will converge to a stable value after several more RTTs. One mitigation could be to set the saved_cwnd based on the flight_size, or an averaged CWND.

  • Observe Phase (application-limited): When the sender is application-limited or in an RTT following a burst of transmission, a sender typically transmits much less data than allowed. Such observations ought to be discounted when estimating the saved_cwnd.

4.2. Confirming the Path in the Reconnaissance Phase

In the Reconnaissance Phase a sender initiates a connection and starts sending initial data. This measures the current minimum RTT. If a decision is made to use Careful Resume, this is used to confirm the path.

The CC is not modified during the Reconnaissance Phase. A sender therefore needs to limit the initial data, sent in the first RTT of transmitted data, to not more than the IW [RFC9000]. This transmission using the IW is assumed to be a safe starting point for any path to avoid adding excessive load to a potentially congested path. (When used in a controlled network, additional information about local path characteristics could be known, which might be used to configure a non-standard IW.)

The method does not permit multiple concurrent reuse of the saved CC parameters. When multiple new concurrent connections are made to a server, each can have a valid endpoint_token, but the saved_cwnd can only be used for one new connection. This is designed to prevent a sender from performing multiple jumps in the cwnd, each individually based on the same saved_cwnd, and hence creating an excessive aggregate load at the bottleneck.

The method used to prevent re-use of the saved CC parameters will depend on the design of the server that is being used (e.g., if all connections from a given client IP arrive at the same server process, then the server process could use a hash table). A distributed system might be required when using some types of load balancing, to ensure this invariant when the load balancing hashes connections by 4-tuple and hence multiple connections from the same client device are served by different server processes.

4.2.1. Confirming the Path

Path characteristics can change over time for many reasons, resulting in the previously observed CC parameters becoming irrelevant. The sender therefore compares the saved_RTT with each of a series of measured RTT samples.

If the current RTT sample is less than a half of the saved_RTT, this is regarded as too small, and is an indicator of a path change. (This factor of two arises, because the rate should not exceed the observed rate when the saved_cwnd was measured, because the jump_cwnd is calculated as half the measured saved_cwnd.)

A current RTT larger than that at the time the saved_cwnd was measured results in a proportionaly lower resumed rate, because the transmission using the CR method is paced based on the current RTT. An RTT sample more than ten times the saved_RTT is regarded as too large, such a high RTT is indicative of a path change. (The factor of ten accommodates both increases in latency from buffering on a path, and any variation between samples).

The senders revert to the Normal Phase if congestion is detected. Some transport protocols implement methods that infer potential congestion from an increase in the RTT. In the Reconnaissance Phase, this indication occurs earlier than congestion which is reported by loss or by ECN marking. Designs need to consider if this is a suitable trigger to revert to the Normal Phase.

4.3. Safety Requirements for the Unvalidated Phase

This section defines the safety requirements for using saved CC parameters to tentatively update the CWND. These safety requirements mitigate the risk of adding excessive congestion to an already congested path.

  • Unvalidated Phase (Jump): A connection must not directly use the previously saved_cwnd to directly initialize a new flow causing it to resume sending at the same rate. The jump_cwnd MUST be no more than half the previously saved_cwnd.

  • Unvalidated Phase (Pacing): Sending is paced based on the current_RTT. Using the current_rtt, rather than the saved_RTT, helps to ensure appropriate pacing, but places a limitation on the minimum acceptable current_RTT to avoid sending at a rate higher than was previously observed.

4.3.1. Lifestime of CC Parameters

The long-term use of the previously observed parameters is not appropriate, a lifetime therefore needs to be specified during which the saved CC parameters can be safely re-used.

[RFC9040] provides guidance on the implementation of TCP Control Block Interdependence, but does not specify how long a saved parameter can safely be reused.

[RFC7661] specifies a method for managing an unvalidated CWND. This states: "After a fixed period of time (the non-validated period (NVP)), the sender adjusts the cwnd (Section 4.4.3). The NVP SHOULD NOT exceed five minutes." Section 5 of [RFC7661] discusses the rationale for choosing that period. However, RFC 7661 targets rate-limited connections using normal CC. The method described in the present specification includes additional mechanisms to avoid and mitigate the effects of overshoot, and therefore this can be used to justify a longer lifetime of the saved_cwnd using the Careful Resume method.

{XXX-Editor NOTE: A future revision of this document could specify a maximum time that CC Parameters can be cached - Ought this to me minutes, hours, days?}

4.3.2. Pacing in the Unvalidated Phase

The sender MUST avoid sending a burst of packets greater than IW as a result of a step-increase in the CWND. (This is consistent with [RFC8085], [RFC9000]). Pacing sent packets as a function of the current RTT provides an additional safety during the Unvalidated Phase. Other sender mitigations have also been suggested to avoid line-rate bursts (e.g., [I-D.hughes-restart]).

The following example provides a relevant pacing rhythm using the RTT and the saved_cwnd. The Inter-packet Transmission Time (ITT) is determined by using the current Maximum Message Size (MMS), the saved_cwnd and the RTT. A safety margin can avoid sending more than a recommended maximum (max_jump):

  • jump_cwnd = min(max_jump,saved_cwnd/2)

  • ITT = (current_RTT x MMS)/jump_cwnd

This follows the idea presented in [RFC4782], [I-D.irtf-iccrg-sallantin-initial-spreading] and [CONEXT15].

4.3.3. Exit from the Unvalidated Phase because of Variable Network Conditions

Unvalidated Phase: Careful Resume is REQUIRED be robust to changes in network conditions due to variations in the forwarding path, reconfiguration of equipment, or changes in the link conditions. This is mitigated by path confirmation.

  • Unvalidated Phase: Careful Resume is REQUIRED to be robust to changes in network traffic, including the arrival of new flows that compete for capacity at a shared bottleneck. This is mitigated by jumping to no more than a half of the saved_cwnd and by using pacing.

  • Unvalidated Phase: Careful Resume is REQUIRED to prevent unduly suppressing flows that used the capacity since the available capacity was measured. This is further mitigated by bounding the duration of the Unvalidated Phase (and the following Validating Phase).

  • Unvalidated Phase: The sender transitions to the Safe Retreat Phase when a packet loss is detected or acknowledgments indicate sent packets were ECN CE-marked. These are an indication of potential congestion.

4.4. Safety Requirements for the Validating Phase

When a sender completes the Unvalidated Phase, either by sending a jump_cwnd of data or after one RTT, it ceases to use the unvalidated CWND. That is, CWND is reset to the flight size, and the sender awaits reception of the acknowledgments to validate the use of this capacity. New packets are sent when previously sent data is newly acknowledged. The purpose is to trigger a safe retreat in the case when the capacity is not validated.

4.5. Safety Requirements for the Safe Retreat Phase

This section defines the safety requirements after congestion has been detected during the Unvalidated Phase.

The Safe Retreat reaction is required to differ from a traditional reaction to detected congestion, because the jump_cwnd can result in a significantly higher rate than would be allowed by the slow-start mechanism. This could aggressively feed a congested bottleneck, resulting in overshoot where a disproportionate number of packets from existing flows are displaced from the buffer at the congested bottleneck. For this reason, a sender needs to react to detected congestion by reducing CWND significantly below the saved_cwnd.

Note: Proportional Rate Reduction (PRR) [RFC6937] assumes that it is safe to reduce the rate gradually when in congestion avoidance. PRR is therefore not appropriate when there might be significant overshoot in the use of the capacity, which can be the case when the Safe Retreat Phase is entered.

The CWND is reduced on entry to the Safe Retreat Phase to a safe starting value.

{XXX-Editor note: This section to be completed XXX}

This provides examples of how to implement the Safe Retreat Phase:

  1. A simple conservative design sets CWND to IW and then resumes using normal slow-start. This does not require measuring the measured at congestion. The resulting pattern of CWND growth resembles that which would have occurred had the design not been used.

  2. The volume of successfully transmitted packets sent using the Unvalidated Phase (e.g., by recording the sequence number of the first packet sent in the phase) is used as a measure of the maximum capacity, called the Pipe. The Pipe is not a safe measure of the currently available share of the capacity whenever there was also a significant overshoot at the bottleneck, as indicated by excessive loss. Therefore, any design that increases CWND based on received acknowledgments ought to avoid unduly taking capacity from sharing flows.

4.6. Returning to Normal Congestion Control

After using Careful Resume, the CC controller returns to the Normal Phase. The implementation details for different transports depend on the design of the transport.

  • For NewReno and CUBIC, it is recommended to exit slow-start and enter the congestion avoidance phase of the CC method.

  • For BBR CC, it is recommended to enter the "probe bandwidth" state.

{XXX-Editor note: A future revision should discuss updating the saved parameters, whether used or not, after reaching normal operation for use the next time even if that update is to just refresh the expiration time.}

4.7. Limitations from Transport Protocols

A sender is limited by any rate-limitation of the transport protocol that is used.

  • For QUIC this includes flow control mechanisms or preventing amplification attacks. In particular, a QUIC receiver might need to issue proactive MAX_DATA frames to increase the flow control limits of a connection that is started when using Careful Resume to gain the expected benefit.

  • A TCP sender is limited by the receiver window (rwnd). Unless configured at a receiver, the rwnd constrains the rate of increase for a connection and reduces the benefit of Careful Resume.

5. QLOG support for QUIC

{XXX-Editor note: This section to be completed XXX}

This section provides definitions that enable the Careful Resume method to generate qlog events when using QUIC. It introduces an event to report the current phase of a sender, and the associated description.

The event and data structure definitions in this section are expressed in the Concise Data Definition Language (CDDL) [RFC8610] and its extensions described in [I-D.ietf-quic-qlog-quic-events].

5.1. cr_phase Event

Importance: Extra

When the CC algorithm changes the Careful Resume Phase described in Section 3 of this specification.

Definition:

XX Fix me as CDDL XX
<!-- CrState::OBSERVE => { 0 }
CrState::RECON => { 1 }
CrState::UNVAL => { 2 }
CrState::VALIDATE => { 3 }
CrState::NORMAL => { 4 }
CrState::RETREAT => { 100 } -->

Figure 1

6. Acknowledgments

The authors would like to thank John Border, Gabriel Montenegro, Patrick McManus, Ian Swett, Igor Lubashev, Robin Marx, Roland Bless, Franklin Simo, Kazuho Oku, Tong, and Neal Cardwell for their fruitful comments on earlier versions of this document.

The authors would like to particularly thank Tom Jones for co-authoring several previous versions of this document. Ana Custura developed the qlog support.

7. IANA Considerations

No current parameters are required to be registered by IANA.

8. Security Considerations

This document does not exhibit specific security considerations. Security considerations for the interactions with the receiver are discussed in [I-D.kuhn-quic-bdpframe-extension].

9. References

9.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8085]
Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, , <https://www.rfc-editor.org/info/rfc8085>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC8610]
Birkholz, H., Vigano, C., and C. Bormann, "Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, , <https://www.rfc-editor.org/info/rfc8610>.
[RFC8801]
Pfister, P., Vyncke, É., Pauly, T., Schinazi, D., and W. Shao, "Discovering Provisioning Domain Names and Data", RFC 8801, DOI 10.17487/RFC8801, , <https://www.rfc-editor.org/info/rfc8801>.
[RFC9000]
Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based Multiplexed and Secure Transport", RFC 9000, DOI 10.17487/RFC9000, , <https://www.rfc-editor.org/info/rfc9000>.

9.2. Informative References

[CONEXT15]
Li, Q., Dong, M., and P B. Godfrey, "Halfback: Running Short Flows Quickly and Safely", ACM CoNEXT , .
[I-D.cardwell-iccrg-bbr-congestion-control]
Cardwell, N., Cheng, Y., Yeganeh, S. H., Swett, I., and V. Jacobson, "BBR Congestion Control", Work in Progress, Internet-Draft, draft-cardwell-iccrg-bbr-congestion-control-02, , <https://datatracker.ietf.org/doc/html/draft-cardwell-iccrg-bbr-congestion-control-02>.
[I-D.hughes-restart]
Hughes, A., "Issues in TCP Slow-Start Restart After Idle", Work in Progress, Internet-Draft, draft-hughes-restart-00, , <https://datatracker.ietf.org/doc/html/draft-hughes-restart-00>.
[I-D.ietf-quic-qlog-quic-events]
Marx, R., Niccolini, L., Seemann, M., and L. Pardue, "QUIC event definitions for qlog", Work in Progress, Internet-Draft, draft-ietf-quic-qlog-quic-events-06, , <https://datatracker.ietf.org/doc/html/draft-ietf-quic-qlog-quic-events-06>.
[I-D.irtf-iccrg-sallantin-initial-spreading]
Sallantin, R., Baudoin, C., Arnal, F., Dubois, E., Chaput, E., and A. Beylot, "Safe increase of the TCP's Initial Window Using Initial Spreading", Work in Progress, Internet-Draft, draft-irtf-iccrg-sallantin-initial-spreading-00, , <https://datatracker.ietf.org/doc/html/draft-irtf-iccrg-sallantin-initial-spreading-00>.
[I-D.kuhn-quic-bdpframe-extension]
Kuhn, N., Emile, S., Fairhurst, G., and C. Huitema, "BDP_Frame Extension", Work in Progress, Internet-Draft, draft-kuhn-quic-bdpframe-extension-03, , <https://datatracker.ietf.org/doc/html/draft-kuhn-quic-bdpframe-extension-03>.
[IJSCN]
Thomas, L., Dubois, E., Kuhn, N., and E. Lochin, "Google QUIC performance over a public SATCOM access", International Journal of Satellite Communications and Networking 10.1002/sat.1301, .
[MAPRG111]
Kuhn, N., Stephan, E., Fairhurst, G., Jones, T., and C. Huitema, "Feedback from using QUIC's 0-RTT-BDP extension over SATCOM public access", IETF 111 - MAPRG meeting , .
[RFC3465]
Allman, M., "TCP Congestion Control with Appropriate Byte Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, , <https://www.rfc-editor.org/info/rfc3465>.
[RFC4782]
Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick-Start for TCP and IP", RFC 4782, DOI 10.17487/RFC4782, , <https://www.rfc-editor.org/info/rfc4782>.
[RFC5783]
Welzl, M. and W. Eddy, "Congestion Control in the RFC Series", RFC 5783, DOI 10.17487/RFC5783, , <https://www.rfc-editor.org/info/rfc5783>.
[RFC6675]
Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., and Y. Nishida, "A Conservative Loss Recovery Algorithm Based on Selective Acknowledgment (SACK) for TCP", RFC 6675, DOI 10.17487/RFC6675, , <https://www.rfc-editor.org/info/rfc6675>.
[RFC6937]
Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional Rate Reduction for TCP", RFC 6937, DOI 10.17487/RFC6937, , <https://www.rfc-editor.org/info/rfc6937>.
[RFC7661]
Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating TCP to Support Rate-Limited Traffic", RFC 7661, DOI 10.17487/RFC7661, , <https://www.rfc-editor.org/info/rfc7661>.
[RFC8867]
Sarker, Z., Singh, V., Zhu, X., and M. Ramalho, "Test Cases for Evaluating Congestion Control for Interactive Real-Time Media", RFC 8867, DOI 10.17487/RFC8867, , <https://www.rfc-editor.org/info/rfc8867>.
[RFC9002]
Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, , <https://www.rfc-editor.org/info/rfc9002>.
[RFC9040]
Touch, J., Welzl, M., and S. Islam, "TCP Control Block Interdependence", RFC 9040, DOI 10.17487/RFC9040, , <https://www.rfc-editor.org/info/rfc9040>.
[RFC9406]
Balasubramanian, P., Huang, Y., and M. Olson, "HyStart++: Modified Slow Start for TCP", RFC 9406, DOI 10.17487/RFC9406, , <https://www.rfc-editor.org/info/rfc9406>.

Appendix A. Appendix: An Endpoint Token

This annex proposes an Endpoint Token to allow a sender to identify its own view of the network path that it is using. In [I-D.kuhn-quic-bdpframe-extension] this Endpoint Token could be shared and used as an opaque path identifier to other parties and the sender can verify if this is one of its current paths.

A.1. Creating an Endpoint Token

When computing the Endpoint Token, the sender includes information to identify the path on which it sends, for example:

  • it needs to include a unique identifier for itself (e.g., a globally assigned address/prefix; or randomly chosen value such as a nonce);

  • it needs to include an identifier for the destination (e.g., a destination IP address or name);

  • it needs to include an interface identifier (e.g., an index value or a MAC address to associate the endpoint with the interface on which the path starts);

  • it could include other information such as the DSCP, ports, flow label, etc (recognising that this additional information might improve the path differentiation, but that this can reduce the re-usability of the token);

  • it could include any other information the sender chooses to include, and potentially including PvD information [RFC8801] or information relating to its public-facing IP address;

  • it could include a time-dependent value to define the validity period of the token.

When creating an Endpoint Token, the sender has to ensure the following:

  1. To reduce the likelihood of misuse of the Endpoint Token, the value ought to be encoded in a way that hides the component information from the recipient and any eavesdropper on the path (this could already protected by methods such as TLS).

  2. The sender can recalculate the Endpoint Token to validate a previously issued token; and can be included in the computed integrity check for any path information it provides.

  3. The Endpoint Token is designed so that if shared, it prevents another party from deriving private data from the token, or to use the token to perform unwanted likability with other information. Therefore, the Endpoint Token MUST necessarily be different when used to identify paths using different interfaces.

Appendix B. Appendix: Revision details

Previous individual submissions were discussed in TSVWG and QUIC.

Authors' Addresses

Nicolas Kuhn
Thales Alenia Space
Emile Stephan
Orange
Godred Fairhurst
University of Aberdeen
Department of Engineering
Fraser Noble Building
Aberdeen
AB24 3UE
United Kingdom
Raffaello Secchi
University of Aberdeen
Department of Engineering
Fraser Noble Building
Aberdeen
AB24 3UE
United Kingdom
Christian Huitema
Private Octopus Inc.