Internet-Draft Fast-Slow RTO and CC Algorithm March 2023
Jarvinen, et al. Expires 14 September 2023 [Page]
CoRE Working Group
Intended Status:
I. Jarvinen
University of Helsinki
M. Kojo
University of Helsinki
I. Raitahila
University of Helsinki
Z. Cao

Fast-Slow Retransmission Timeout and Congestion Control Algorithm for CoAP


This document specifies an alternative retransmission timeout (RTO) and congestion control back off algorithm for the CoAP protocol, called Fast-Slow RTO (FASOR).

The algorithm specified in this document employs an appropriate and large enough back off of RTO as the major congestion control mechanism to allow acquiring unambiguous RTT samples with high probability and to prevent building a persistent queue when retransmitting. The algorithm also aims to retransmit quickly using an accurately managed RTO when link-errors are occuring, basing RTO calculation on unambiguous round-trip time (RTT) samples.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 14 September 2023.

Table of Contents

1. Introduction

CoAP senders use retransmission timeout (RTO) to infer losses that have occurred in the network. For such a heuristic to be correct, the RTT estimate used for calculating the RTO must match to the real end-to-end path characteristics. Otherwise, unnecessary retransmission may occur. Both default RTO mechanism for CoAP [RFC7252] and the latest version of CoCoA [I-D.ietf-core-cocoa] have issues in dealing with unnecessary retransmissions and in the worst-case the situation can persist causing congestion collapse [JRCK18a].

This document specifies FASOR retransmission timeout and congestion control algorithm [JRCK18b]. FASOR algorithm ensures that unnecessary retransmissions due to an inaccurate RTT estimate will not persist, avoiding the threat of congestion collapse. FASOR also aims to quickly restore the accuracy of the RTT estimate. Armed with an accurate RTT estimate, FASOR not only handles congestion robustly but also can quickly infer losses due to link errors.

2. Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [RFC2119].

3. Problems with Existing CoAP Congestion Control Algorithms

Correctly inferring losses requires the RTO to be longer than the real RTT in the network. Under certain circumstances the RTO may be incorrectly small. If the real end-to-end RTT is larger than the RTO, it is impossible for the sender to avoid making unnecessary retransmissions that duplicate data still existing in the network because the sender cannot receive any feedback in time. Unnecessary retransmissions cause two basic problems. First, they increase the perceived end-to-end RTT if the bottleneck has buffering capacity, and second, they prevent getting unambiguous RTT samples. Making unnecessary retransmissions is also a pre-condition for the congestion collapse [RFC0896], which may occur in the worst case if retransmissions are not well controlled. Therefore, the sender RTO algorithm should actively attempt to prevent unnecessary retransmissions from persisting under any circumstance.

Karn's algorithm [KP87] has prevented unnecessary retransmission from turning into congestion collapse for decades due to robust RTT estimation and RTO backoff handling. The recent CoAP congestion control algorithms, however, diverge from the principles of Karn's algorithm in significant ways and may pose a threat to the stability of the Internet due to those differences [JRCK18a].

The default RTO mechanism for CoAP [RFC7252] uses only an initial RTO dithered between 2 and 3 seconds, while CoCoA [I-D.ietf-core-cocoa] measures RTT both from unambiguous and ambiguous RTT samples and applies a modified version of the TCP RTO algorithm [RFC6298]. The algorithm for default CoAP in RFC 7252 lacks solution to persistent congestion; the binary exponential backoff used for the RTO does not properly address unnecessary retransmissions when RTT is larger than the default RTO (ACK_TIMEOUT), because default CoAP does not apply a larger, backed-off RTO timer value from the previous message exchange to the backoff series of the subsequent message exchange as required by the Karn's algorithm [KP87]. If a default CoAP sender performs exchanges over an end-to-end path with such a high RTT, it persistently keeps making unnecessary retransmissions for every exchange wasting some fraction of the used resources (network capacity, battery power) [JRCK18a].

CoCoA [I-D.ietf-core-cocoa] attempts to improve scenarios with link-error related losses and solve persistent congestion by basing its RTO value on an estimated RTT. However, there are couple of exceptions when the RTT estimate is not available or it cannot be updated:

CoCoA shares with default CoAP the same shortcoming of not applying the backed-off RTO from the previous message exchange. This combined with either of the exceptions discussed above results in persistent unnecessary retransmissions also with CoCoA, when the RTT is high enough.

4. FASOR Algorithm

FASOR [JRCK18b] is composed of three key components: RTO computation, Slow RTO, and a novel RTO back off logic.

4.1. Computing Normal RTO (FastRTO)

The FASOR algorithm measures the RTT for a CoAP message exchange over an end-to-end path and computes the RTO value using the TCP RTO algorithm specified in [RFC6298]. We call this normal RTO as FastRTO. In contrast to the TCP RTO mechanism, FASOR SHOULD NOT use 1 second lower-bound when setting the RTO because RTO is the primary and only loss detection mechanism with CoAP, whereas RTO is only a backup mechanism for loss detection with TCP. A lower-bound of 1 second would impact timeliness of the loss detection in low RTT environments. The RTO value MAY be upper-bounded by at least 60 seconds. A CoAP sender using the FASOR algorithm SHOULD set the initial RTO to 2 seconds. The computed FastRTO value as well as the initial RTO value is subject to dithering; both are dithered between RTO + 1/4 x SRTT and RTO + SRTT. For dithering the initial RTO, SRTT is unset; therefore, SRTT is replaced with initial RTO / 3 which is derived from the RTO formula and equals to a hypothetical initial RTT sample that would yield the initial RTO of 2 seconds using the SRTT and RTTVAR initialization rule of RFC 6298. That is, for the initial RTO of 2 seconds we use SRTT value of 2/3 seconds.

FastRTO is updated only with unambiguous RTT samples. Therefore, it closely tracks the actual RTT of the network and can quickly trigger a retransmission when the network state is not dubious. Retransmitting without extra delay is very useful when the end-to-end path is subject to losses that are unrelated to congestion. When the first unambiguous RTT sample (R) is received, the RTT estimator is initialized with that sample as specified in RFC 6298, except RTTVAR that is set to R/2K where K = 4 [RFC6298].

4.2. Slow RTO

We introduce Slow RTO as a safe way to ensure that only a single copy of message is sent before at least one RTT has elapsed. To achieve this the sender must ensure that its RTO is set to a value that is larger than the path end-to-end RTT that may be inflated by unnecessary retransmissions themselves. Therefore, whenever a message needs to be retransmitted, we measure Slow RTO as the elapsed time required for getting an acknowledgement. That is, Slow RTO is measured starting from the original transmission of the message until the receipt of the acknowledgement, regardless of the number of retransmissions. In this way, Slow RTO always covers the worst-case RTT during which a number of unnecessary retransmissions were made but the acknowledgement is received for the original transmission. In contrast to computing the FastRTO, Slow RTO is not smoothed because it is derived from the sending pattern of the retransmissions (that may turn out unnecessary). In order to drain the potential unnecessary retransmissions successfully from the network, it makes sense to wait for the time used for sending them rather than some smoothed value. However, Slow RTO is multiplied by a factor to allow some growth in load without expiring Slow RTO too early (by default the factor of 1.5 is used). Whenever a message is retransmitted, FASOR applies Slow RTO as one of the backed off timer values used with the next message (see Section 4.3).

Slow RTO allows rapidly converging towards a stable operating point because 1) it lets the duplicate copies sent earlier to drain from the network reducing the perceived end-to-end RTT, and 2) allows enough time to acquire an unambiguous RTT sample for the FastRTO computation. Robustly acquiring the RTT sample ensures that the next RTO is set according to the recent measurement and further unnecessary retransmissions are avoided. Slow RTO itself is a form of back off because it includes the accumulated time from the RTO backoff of the previous exchange. FASOR uses this for its advantage as the time included into Slow RTO is what is needed to drain all unnecessary retransmissions possibly made during the previous exchange. Assuming a stable RTT and that all of the retransmissions were unnecessary, the time to drain them is the time elapsed from the original transmission to the sending time of the last retransmission plus one RTT. When the acknowledgement for the original transmission arrives, one RTT has already elapsed, leaving only the sending time difference still unaccounted for which is at minimum the value for Slow RTO (when an RTT sample arrives immediately after the last retransmission). Even if RTT would be increasing, the draining still occurs rapidly due to exponentially backed-off frequency in sending the unnecessary retransmissions.

4.3. Retransmission Timeout Backoff Logic

4.3.1. Overview

FASOR uses FastRTO as the base for binary exponential backoff when no retransmission were needed for the previous CoAP message exchange. When retransmission were needed for the previous CoAP message exchange, the algorithm rules, however, are more complicated than with the traditional RTO back off because Slow RTO is injected into the back off series to reduce high impact of using Slow RTO. FASOR logic chooses from three possible back off series alternatives:

  • FAST backoff: Perform traditional RTO back off with the FastRTO as the base. Applied when the previous message was not retransmitted.
  • FAST_SLOW_FAST backoff: First use the FastRTO for the original transmission of the message to improve cases with losses unrelated to congestion. If the original transmission of the message is successful without retransmissions, continue with the FAST backoff for the next message exchange. If the message needs to be retransmitted, continue by using Slow RTO for the first retransmission in order to respond to congestion and drain the network from the unnecessary retransmissions that were potentially sent during the previous message exchange. If still further RTOs are needed, continue by backing off the FastRTO further on each timeout. FAST_SLOW_FAST backoff is applied just once when the previous message using FAST backoff required one or more retransmissions.
  • SLOW_FAST backoff: Perform Slow RTO first for the original transmission to respond to congestion and to acquire an unambiguous RTT sample with high probability. Then, if the original message needs to be retransmitted, continue with the FastRTO-based RTO back off serie by backing off the FastRTO on each timeout. The SLOW_FAST backoff is applied when the previous message using the FAST_SLOW_FAST or SLOW_FAST backoff required one or more retransmissions. Once an acknowledgement for the original transmission with an unambigous RTT sample is received, continue with the FAST backoff for the next message exchange.

For the initial message, FAST backoff is used with INITIAL_RTO as the FastRTO value. From there on, state is updated when an acknowledgement arrives. Following unambiguous RTT samples, FASOR always uses FAST backoff. Whenever retransmissions are needed, the backoff series selection is first downgraded to FAST_SLOW_FAST backoff and then to SLOW_FAST backoff if further retransmissions are needed in FAST_SLOW_FAST backoff.

When Slow RTO in the SLOW_FAST backoff series is used as the first RTO value, the sender is likely to acquire an unambiguous RTT sample even when the network has high delay due to congestion because Slow RTO is based on a very recent measurement of the worst-case RTT. However, using Slow RTO may negatively impact the performance when losses unrelated to congestion are occurring. Due to its potential high cost, FASOR algorithm attempts to avoid using Slow RTO unnecessarily.

The CoAP protocol is often used by devices that are connected through a wireless network where non-congestion related losses are much more frequent than in their wired counterparts. This has implications for the RTO algorithm. While it would be possible to implement FASOR such that it always immediately uses Slow RTO when a dubious network state is detected with the previous message, which would handle congestion very well, it would do significant harm for performance when RTOs occur due to non-congestion related losses. Instead, in the FAST_SLOW_FAST state, FASOR uses first the FastRTO for the transmission of the original message and only responds using Slow RTO if the FastRTO expires also for that message. Such a pattern quickly probes if the losses were unrelated to congestion and only slightly delays acknowledgement if real congestion event is taking place. To ensure that an unambiguous RTT sample is also acquired on a congested network path, FASOR then needs to use Slow RTO for the original transmission of the subsequent message if the probe was not successful.

4.3.2. Retransmission State Machine

FASOR consists of the three states discussed above while making retranmission decisions on the backoff logic and RTO to use: FAST, FAST_SLOW_FAST, and SLOW_FAST. The state machine of the FASOR algorithm is depicted in Figure 1.

     |                                    |
     v                                    |
|   ^ ^             |                         ^   |
|   | |             |                         |   |
+-b-+ +------b------+                         +-a-+

   a: retransmission acknowledged, ambiguous RTT sample acquired;
   b: no retransmission, umambiguous RTT sample acquired;
Figure 1: State Machine of FASOR

The FAST state is the initial state and it is applied always when the previous message was not retransmitted. In the FAST state, if the original transmission of the message has not been acknowledged by the receiver within the time defined by FastRTO, the sender will retransmit it. If there is still no acknowledgement of the retransmitted packet within 2*FastRTO, the sender performs a second retransmission and, if necessary, each further retransmission also applies the binary exponential backoff of the FastRTO. The retransmission interval in this state is defined as FastRTO, 2^1 * FastRTO, ..., 2^i * FastRTO.

When an acknowledgement arrives after any retransmission in any of the states, the sender will calculate SlowRTO value based on the algorithm defined in Section 4.2.

In the FAST state, if an acknowledgement arrives after any retransmission, the sender will switch to the second state, FAST_FLOW_FAST. In this state, the retransmission interval is defined as FastRTO, Max(SlowRTO, 2*FastRTO), FastRTO * 2^1, ..., 2^i * FastRTO. The state will be switched back to the FAST state once an acknowledgement is returned within FastRTO, i.e., no retransmissions for a message. This is reasonable because it shows the network has recovered from congestion or a bloated queue.

If at least one retransmission has been made before the acknowledged arrives in the FAST_SLOW_FAST state, the sender updates the SlowRTO value, and moves to the third state, SLOW_FAST. The retransmission interval in the SLOW_FAST state is defined as SlowRTO, FastRTO, FastRTO * 2^1, ..., 2^i * FastRTO.

In the SLOW_FAST state, the sender switches back to the FAST state if an unambiguous acknowledgement arrives. Otherwise, if any number of retransmissions is needed again, the sender stays in the SLOW_FAST state.

4.4. Retransmission Count Option

When retransmissions are needed to deliver a CoAP message, it is not possible to measure RTT for the RTO computation as the RTT sample becomes ambiguous. Therefore, it would be beneficial to be able to distinguish whether an acknowledgement arrives for the original transmission of the message or for a retransmission of it. This would allow reliably acquiring an RTT sample for every CoAP message exchange and thereby compute a more accurate RTO even during periods of congestion and loss.

The Retransmission Count Option is used to distinguish whether an Acknowledgement message arrives for the original transmission or one of the retransmissions of a Confirmable message. However, the Retransmission Count Option cannot be used with an Empty Acknowledgement (or Reset) message because the CoAP protocol specification [RFC7252] does not allow adding options to an Empty message. Therefore, Retransmission Count Option is useful only for the common case of Piggybacked Response. In case of Empty Acknowledgements the operation of FASOR is the same as without the option. This restriction with Empty Acknowledgements may limit the usefulness of the Retransmission Count Option in deployment scenarios where the receiver is a proxy that will typically respond with an Empty Acknowledgement when it receives a request message.

Table 1: Retransmission Count Option
No. C U N R Name Format Length Default
TBD X Rexmit-Cnt uint 0-1 0
C=Critical, U=Unsafe, N=NoCacheKey, R=Repeatable

Implementation of the Retransmission Count option is optional and it is identified as elective. However, when it is present in a CoAP message and a CoAP endpoint processes it, it MUST be processed as described in this document. The Retransmission Count option MUST NOT occur more than once in a single message.

The value of the Retransmission Count option is a variable-size (0 to 1 byte) unsigned integer. The default value for the option is the number 0 and it is represented with an empty option value (a zero-length sequence of bytes). However, when a client intends to use Retransmit Count option, it MUST reserve space for it by limiting the request message size also when the value is empty in order to fit the full-sized option into retransmissions.

The Retransmission Count option can be present in both the request and response message. When the option is present in a request it indicates the ordinal number of the transmission for the request message.

If the server supports (implements) the Retransmission Count option and the option is present in a request, the server MUST echo the option value in its Piggybacked Response unmodified. If the server replies with an Empty Acknowledgement the server MUST silently ignore the option and MUST NOT include it in a later separate response to that request.

When Piggybacked Response carrying the Retransmission Count option arrives, the client uses the option to match the response message to the corresponding transmission of the request. In order to measure a correct RTT, the client must store the timestamp for the original transmission of the request as well as the timestamp for each retransmission, if any, of the request. The resulting RTT sample is used for the RTO computation. If the client retransmitted the request without the option but the response includes the option, the client MUST silently ignore the option.

The original transmission of a request is indicated with the number 0, except when sending the first request to a new destination endpoint (i.e., an endpoint not already in the memory). The first original transmission of the request to a new endpoint carries the number 255 (0xFF) and is interpreted the same as an original transmission carrying the number 0. Once the first Piggybacked Response from the new endpoint arrives the client learns whether or not the other endpoint implements the option. If the first response includes the echoed option, the client learns that the other endpoint supports the option and may continue including the option to each retransmitted request. From this point on the original transmissions of requests implicitly include the option number 0 and a zero-byte integer will be sent according to the CoAP uint-encoding rules. If the first Piggybacked Response does not include the option, the client SHOULD stop including the option into the requests to that endpoint. Retransmissions, if any, carry the ordinal number of the retransmission. That is, the client increments the retransmission count by one for each retransmission of the message.

When the Retransmission Count option is in use, the client bases the RTO for the FastRTO in the back off series as follows:

Previous-RTT-Sample is the RTT sample acquired from the previous message exchange. If no RTT sample was available with the previous message exchange (e.g., the server replied with an Empty Acknowledgement), RTO computed earlier is used like in case the Retransmission Count option is not in use.

4.5. Alternatives for Exchanging Retransmission Count Information

An alternative way of exchanging the retransmission count information between a client and server is to encode it in the Token. The Token is a client-local identifier and a client solely decides how it generates the Token. Therefore, including a varying Token value to retransmissions of the same request is all possible as long as the client can use the Token to differentiate between requests and match a response to the corresponding request. The server is required to make no assumptions about the content or structure of a Token and always echo the Token unmodified in its response.

How exactly a client encodes the retransmission count into a Token is an implementation issue. Note that the original transmission of a request may carry a zero-length Token given that the rules for generating a Token as specified in RFC 7252 [RFC7252] are followed. This allows reducing the overhead of including the Token into the reguests in such cases where Token could otherwise be omitted. However, similar to Retransmit Count option the maximum request message size MUST be limited to accommodate the Token with retransmit count into the retransmissions of the request.

5. Security Considerations

6. IANA Considerations

This memo includes no request to IANA.

7. References

7.1. Normative References

Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <>.
Paxson, V., Allman, M., Chu, J., and M. Sargent, "Computing TCP's Retransmission Timer", RFC 6298, DOI 10.17487/RFC6298, , <>.
Shelby, Z., Hartke, K., and C. Bormann, "The Constrained Application Protocol (CoAP)", RFC 7252, DOI 10.17487/RFC7252, , <>.

7.2. Informative References

Nagle, J., "Congestion Control in IP/TCP Internetworks", RFC 896, DOI 10.17487/RFC0896, , <>.
Bormann, C., Betzler, A., Gomez, C., and I. Demirkol, "CoAP Simple Congestion Control/Advanced", Work in Progress, Internet-Draft, draft-ietf-core-cocoa-03, , <>.
Karn, P. and C. Partridge, "Improving Round-trip Time Estimates in Reliable Transport Protocols", SIGCOMM'87 Proceedings of the ACM Workshop on Frontiers in Computer Communications Technology, .
Jarvinen, I., Raitahila, I., Cao, Z., and M. Kojo, "Is CoAP Congestion Safe?", Applied Networking Research Workshop (ANRW'18), .
Jarvinen, I., Raitahila, I., Cao, Z., and M. Kojo, "FASOR Retransmission Timeout and Congestion Control Mechanism for CoAP", Proceedings of IEEE Global Communications Conference (Globecom 2018), .

Appendix A. Pseudocode for Basic FASOR without Dithering

var state = NORMAL_RTO

rfc6298_init(var fastrto, 2 secs)

var slowrto

var original_sendtime
var retransmit_count

 * Sending Original Copy and Retransmitting 'req'
send_request(req) {
  original_sendtime =
  retransmit_count = 0


rto_for(req) {
  retransmit_count += 1


 * ACK Processings
ack() {
  sample = - original_sendtime
  if (retransmit_count == 0)

unambiguous_ack(sample) {
  k = 4                               // RFC6298 default K = 4
  if (rfc6298_is_first_sample(fastrto))
    k = 1
  rfc6298_update(fastrto, k, sample)  // Normal RFC6298 processing
  state = NORMAL_RTO

ambiguous_nextstate = {

ambiguous_ack(sample) {
  slowrto = sample * SLOWRTO_FACTOR
  state = ambiguous_nextstate[state]

 * RTO Calculations
calculate_rto() {
  return <state>_rtoseries()

normal_rtoseries() {
  switch (retransmit_count) {
    case 0: return fastrto_series_init()
    default: return fastrto_series_backoff()

fastslowfast_rtoseries() {
  switch (retransmit_count) {
    case 0: return fastrto_series_init()
    case 1: return MAX(slowrto, 2*fastrto)
    default: return fastrto_series_backoff()

slowfast_rtoseries() {
  switch (retransmit_count) {
    case 0: return slowrto
    case 1: return fastrto_series_init()
    default: return fastrto_series_backoff()

var backoff_series_timer

fastrto_series_init() {
  backoff_series_timer = fastrto
  return backoff_series_timer

fastrto_series_backoff() {
  backoff_series_timer *= 2
  return backoff_series_timer
Figure 2

Authors' Addresses

Ilpo Jarvinen
University of Helsinki
P.O. Box 68
Markku Kojo
University of Helsinki
P.O. Box 68
Iivo Raitahila
University of Helsinki
FI- Helsinki
Zhen Cao