Requirements for Time-Based Loss DetectionInternational Computer Science Institute2150 Shattuck Ave., Suite 1100BerkeleyCAUnited States of America94704mallman@icir.orghttps://www.icir.org/mallman
Transport
TCPMretransmission timeoutpacket lossloss detectionrequirementsMany protocols must detect packet loss for various reasons
(e.g., to ensure reliability using retransmissions or to understand the
level of congestion along a network path). While many mechanisms have
been designed to detect loss, ultimately, protocols can only count on the
passage of time without delivery confirmation to declare a packet "lost".
Each implementation of a time-based loss detection mechanism represents a
balance between correctness and timeliness; therefore, no implementation
suits all situations. This document provides high-level requirements for
time-based loss detectors appropriate for general use in unicast
communication across the Internet. Within the requirements,
implementations have latitude to define particulars that best address each
situation.Status of This Memo
This memo documents an Internet Best Current Practice.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by
the Internet Engineering Steering Group (IESG). Further information
on BCPs is available in Section 2 of RFC 7841.
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
.
Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
() in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Table of Contents
. Introduction
. Terminology
. Context
. Scope
. Requirements
. Discussion
. Security Considerations
. IANA Considerations
. References
. Normative References
. Informative References
Acknowledgments
Author's Address
Introduction
As a network of networks, the Internet consists of a large variety
of links and systems that support a wide variety of tasks and
workloads. The service provided by the network varies from
best-effort delivery among loosely connected components to highly
predictable delivery within controlled environments (e.g., between
physically connected nodes, within a tightly controlled data
center). Each path through the network has a set of path
properties, e.g., available capacity, delay, and packet loss. Given
the range of networks that make up the Internet, these properties
range from largely static to highly dynamic.
This document provides guidelines for developing an understanding of one
path property: packet loss. In particular, we offer guidelines for
developing and implementing time-based loss detectors that have been
gradually learned over the last several decades. We focus on the general
case where the loss properties of a path are (a) unknown a priori and (b)
dynamically varying over time. Further, while there are numerous root
causes of packet loss, we leverage the conservative notion that loss is an
implicit indication of congestion . While this stance is not always correct, as a general
assumption it has historically served us well . As we discuss further in , the
guidelines in this document should be viewed as a general default for
unicast communication across best-effort networks and not as optimal -- or
even applicable -- for all situations.
Given that packet loss is routine in best-effort networks, loss
detection is a crucial activity for many protocols and applications
and is generally undertaken for two major reasons:
Ensuring reliable data deliveryThis requires a data sender to develop an understanding of
which transmitted packets have not arrived at the receiver.
This knowledge allows the sender to retransmit missing
data.
Congestion control
As we mention above, packet loss is often taken as an
implicit indication that the sender is transmitting too fast and
is overwhelming some portion of the network path. Data senders
can therefore use loss to trigger transmission rate
reductions.
Various mechanisms are used to detect losses in a packet stream.
Often, we use continuous or periodic acknowledgments from the
recipient to inform the sender's notion of which pieces of data are
missing. However, despite our best intentions and most robust
mechanisms, we cannot place ultimate faith in receiving such
acknowledgments but can only truly depend on the passage of time.
Therefore, our ultimate backstop to ensuring that we detect all loss
is a timeout. That is, the sender sets some expectation for how
long to wait for confirmation of delivery for a given piece of data.
When this time period passes without delivery confirmation, the
sender concludes the data was lost in transit.The specifics of time-based loss detection schemes represent a
tradeoff between correctness and responsiveness. In other words, we
wish to simultaneously:
wait long enough to ensure the detection of loss is correct,
and
minimize the amount of delay we impose on applications (before
repairing loss) and the network (before we reduce the
congestion).
Serving both of these goals is difficult, as they pull in opposite
directions . By not waiting long
enough to accurately determine a packet has been lost, we may provide a
needed retransmission in a timely manner but risk both sending unnecessary
("spurious") retransmissions and needlessly lowering the transmission rate.
By waiting long enough that we are unambiguously certain a packet has been
lost, we cannot repair losses in a timely manner and we risk prolonging
network congestion.
Many protocols and applications -- such as TCP , SCTP , and SIP
-- use their own time-based loss detection mechanisms.
At
this point, our experience leads to a recognition that often specific
tweaks that deviate from standardized time-based loss detectors do not
materially impact network safety with respect to congestion control . Therefore, in this document we outline a
set of high-level, protocol-agnostic requirements for time-based loss
detection. The intent is to provide a safe foundation on which
implementations have the flexibility to instantiate mechanisms that best
realize their specific goals.Terminology
The key words "MUST", "MUST NOT",
"REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are
to be interpreted as described in BCP 14 when, and only when, they appear in all capitals,
as shown here.
Context
This document is different from the way we ideally like to engineer
systems. Usually, we strive to understand high-level requirements
as a starting point. We then methodically engineer specific
protocols, algorithms, and systems that meet these requirements.
Within the IETF standards process, we have derived many time-based
loss detection schemes without the benefit of some over-arching
requirements document -- because we had no idea how to write such a
document! Therefore, we made the best specific decisions we could
in response to specific needs.
At this point, however, the community's experience has matured to
the point where we can define a set of general, high-level
requirements for time-based loss detection schemes. We now
understand how to separate the strategies these mechanisms use that
are crucial for network safety from those small details that do not
materially impact network safety. The requirements in this document
may not be appropriate in all cases. In particular, the guidelines
in are concerned with the general case, but
specific
situations may allow for more flexibility in terms of loss detection
because specific facets of the environment are known (e.g., when
operating over a single physical link or within a tightly controlled
data center). Therefore, variants, deviations, or wholly different
time-based loss detectors may be necessary or useful in some cases.
The correct way to view this document is as the default case and not
as one-size-fits-all guidance that is optimal in all cases.
Adding a requirements umbrella to a body of existing specifications
is inherently messy and we run the risk of creating inconsistencies
with both past and future mechanisms. Therefore, we make the
following statements about the relationship of this document to past
and future specifications:
This document does not update or obsolete any existing RFC. These
previous specifications -- while generally consistent with the
requirements in this document -- reflect community consensus, and this
document does not change that consensus.
The requirements in this document are meant to provide for network
safety and, as such, SHOULD be used by all future
time-based loss detection mechanisms.
The requirements in this document may not be appropriate in all
cases; therefore, deviations and variants may be necessary in the
future (hence the "SHOULD" in the last bullet).
However, inconsistencies MUST be (a) explained and (b)
gather consensus.
Scope
The principles we outline in this document are protocol-agnostic and
widely applicable. We make the following scope statements about
the application of the requirements discussed in :
While there are a bevy of uses for timers in
protocols -- from rate-based pacing to connection failure detection
and beyond -- this document is focused only on loss
detection.
The requirements for time-based loss detection
mechanisms in this document are for the primary or "last resort"
loss detection mechanism, whether the mechanism is the sole loss
repair strategy or works in concert with other mechanisms.
While a straightforward time-based loss detector is sufficient
for simple protocols like DNS , more
complex protocols often use more advanced loss detectors to aid
performance. For instance, TCP and SCTP have methods to detect
(and repair) loss based on explicit endpoint state sharing .
Such mechanisms often provide more timely and precise loss
detection than time-based loss detectors. However, these
mechanisms do not obviate the need for a "retransmission timeout"
or "RTO" because, as we discuss in , only
the passage
of time can ultimately be relied upon to detect loss. In other
words, we ultimately cannot count on acknowledgments to arrive at
the data sender to indicate which packets never arrived at the
receiver. In cases such as these, we need a time-based loss
detector to function as a "last resort".
Also, note that some recent proposals have incorporated time
as a component of advanced loss detection methods either as an
aggressive first loss detector in certain situations or in
conjunction with endpoint state sharing . While these mechanisms can aid timely loss
recovery, the protocol ultimately leans on another more
conservative timer to ensure reliability when these mechanisms
break down. The requirements in this document are only directly
applicable to last-resort loss detection. However, we expect that
many of the requirements can serve as useful guidelines for more
aggressive non-last-resort timers as well.
The requirements in this document apply only to
endpoint-to-endpoint unicast communication. Reliable multicast
(e.g., ) protocols are
explicitly outside
the scope of this document.
Protocols such as SCTP and Multipath TCP (MP-TCP) that communicate in a unicast fashion with
multiple specific endpoints can leverage the requirements in this
document provided they track state and follow the requirements for
each endpoint independently. That is, if host A communicates with
addresses B and C, A needs to use independent time-based loss
detector instances for traffic sent to B and C.
There are cases where state is shared across connections or flows
(e.g., and ). State pertaining to time-based
loss detection is often discussed as sharable. These situations raise
issues that the simple flow-oriented time-based loss detection
mechanism discussed in this document does not consider (e.g., how long
to preserve state between connections). Therefore, while the general
principles given in are
likely applicable, sharing time-based loss detection information
across flows is outside the scope of this document.
Requirements
We now list the requirements that apply when designing primary or
last-resort time-based loss detection mechanisms. For historical
reasons and ease of exposition, we refer to the time between sending
a packet and determining the packet has been lost due to lack of
delivery confirmation as the "retransmission timeout" or "RTO".
After the RTO passes without delivery confirmation, the sender may
safely assume the packet is lost. However, as discussed above, the
detected loss need not be repaired (i.e., the loss could be detected
only for congestion control and not reliability purposes).
As we note above, loss detection happens when a sender does not
receive delivery confirmation within some expected period of
time. In the absence of any knowledge about the latency of a
path, the initial RTO MUST be conservatively set to no less
than
1 second.
Correctness is of the utmost importance when transmitting
into a network with unknown properties because:
Premature loss detection can trigger spurious retransmits
that could cause issues when a network is already
congested.
Premature loss detection can needlessly cause congestion
control to dramatically lower the sender's allowed
transmission rate, especially since the rate is already
likely low at this stage of the communication. Recovering
from such a rate change can take a relatively long time.
Finally, as discussed below, sometimes using time-based
loss detection and retransmissions can cause ambiguities in
assessing the latency of a network path. Therefore, it is
especially important for the first latency sample to be free
of ambiguities such that there is a baseline for the remainder
of the communication.
The specific constant (1 second) comes from the analysis of
Internet
round-trip times (RTTs) found in .
We now specify four requirements that pertain to setting an
expected time interval for delivery confirmation.
Often, measuring the time required for delivery confirmation is
framed as assessing the RTT of the network path.
The RTT is the minimum amount of time required to receive delivery
confirmation and also often follows protocol behavior whereby
acknowledgments are generated quickly after data arrives. For
instance, this is the case for the RTO used by TCP and SCTP . However, this
is somewhat misleading, and the expected latency is better framed as
the "feedback time" (FT). In other words, the expectation is not
always simply a network property; it can include additional time
before a sender should reasonably expect a response.
For instance, consider a UDP-based DNS request from a client to
a
recursive resolver .
When the request can be
served from the resolver's cache, the feedback time (FT) likely
well approximates the
network RTT between the client and resolver. However, on a cache
miss,
the resolver will request the needed information from one or more
authoritative DNS servers, which will non-trivially increase the
FT
compared to the network RTT between the client and resolver.Therefore, we express the requirements in terms of FT. Again,
for
ease of exposition, we use "RTO" to indicate the interval between
a
packet transmission and the decision that the packet has been
lost, regardless of whether the packet will be retransmitted.
The RTO SHOULD be set based on multiple
observations of the FT when available.
In other words, the RTO should represent an empirically
derived
reasonable amount of time that the sender should wait for delivery
confirmation before deciding the given data is lost. Network paths are
inherently dynamic; therefore, it is crucial to incorporate multiple
recent FT samples in the RTO to take into account the delay variation
across time.For example, TCP's RTO would satisfy this requirement due to its
use of an exponentially weighted moving average (EWMA) to
combine multiple FT samples into a "smoothed RTT". In the
name of conservativeness, TCP goes further to also include an
explicit variance term when computing the RTO.While multiple FT samples are crucial for capturing the
delay
dynamics of a path, we explicitly do not tightly specify the
process -- including the number of FT samples to use and how/when to age
samples out of the RTO calculation -- as the particulars could depend on
the situation and/or goals of each specific loss detector.Finally, FT samples come from packet exchanges between
peers. We
encourage protocol designers -- especially for new protocols -- to
strive
to ensure the feedback is not easily spoofable by on- or off-path
attackers such that they can perturb a host's notion of the FT.
Ideally, all messages would be cryptographically secure, but given that
this is not always possible -- especially in legacy protocols -- using a
healthy amount of randomness in the packets is encouraged.
FT observations SHOULD be taken and
incorporated into the RTO at
least once per RTT or as frequently as data is exchanged in cases where
that happens less frequently than once per RTT.
Internet measurements show that taking only a single FT
sample per
TCP connection results in a relatively poorly performing RTO mechanism
, hence this requirement that the
FT be sampled
continuously throughout the lifetime of communication.As an example, TCP takes an FT sample roughly once per RTT,
or, if using the timestamp option , on each acknowledgment arrival. shows that both these
approaches result in roughly equivalent performance for the
RTO estimator.
FT observations MAY be taken from non-data
exchanges.
Some protocols use non-data exchanges for various reasons,
e.g.,
keepalives, heartbeats, and control messages. To the extent that the
latency of these exchanges mirrors data exchange, they can be leveraged
to take FT samples within the RTO mechanism. Such samples can help
protocols keep their RTO accurate during lulls in data transmission.
However, given that these messages may not be subject to the same delays
as data transmission, we do not take a general view on whether this is
useful or not.
An RTO mechanism MUST NOT use ambiguous FT
samples.
Assume two copies of some packet X are transmitted at
times t0 and t1. Then, at time t2, the sender receives
confirmation that X in fact arrived. In some cases, it is not
clear which copy of X triggered the confirmation; hence, the
actual FT is either t2-t1 or t2-t0, but which is a mystery.
Therefore, in this situation, an implementation MUST NOT use either version of the FT sample and hence not
update the RTO (as discussed in and ).There are cases where two copies of some data are
transmitted in a way whereby the sender can tell which is
being acknowledged by an incoming ACK. For example, TCP's
timestamp option allows for
packets to be
uniquely identified and hence avoid the ambiguity. In such
cases, there is no ambiguity and the resulting samples can
update the RTO.
Loss detected by the RTO mechanism MUST be taken
as an indication of network congestion and the sending rate adapted
using a standard mechanism (e.g., TCP collapses the congestion
window to one packet ).
This ensures network safety.An exception to this rule is if an IETF standardized mechanism
determines that a particular loss is due to a non-congestion event
(e.g., packet corruption). In such a case, a congestion control action
is not required. Additionally, congestion control actions taken based
on time-based loss detection could be reversed when a standard mechanism
post facto determines that the cause of the loss was not congestion
(e.g., ).
Each time the RTO is used to detect a loss, the value of the RTO
MUST
be exponentially backed off such that the next firing requires a longer
interval. The backoff SHOULD be removed after either (a)
the subsequent
successful transmission of non-retransmitted data, or (b) an RTO passes
without detecting additional losses. The former will generally be
quicker. The latter covers cases where loss is detected but not
repaired.
A maximum value MAY be placed on the RTO. The
maximum RTO MUST NOT be less than 60 seconds (as
specified in ).This ensures network safety.As with guideline (3), an exception to this rule exists if an
IETF
standardized mechanism determines that a particular loss is not due to
congestion.
Discussion
We note that research has shown the tension between the
responsiveness and correctness of time-based loss detection seems to
be a fundamental tradeoff in the context of TCP . That is,
making the RTO more aggressive (e.g., via changing TCP's
exponentially weighted moving average (EWMA) gains, lowering the
minimum RTO, etc.) can reduce the time required to detect actual
loss. However, at the same time, such aggressiveness leads to more
cases of mistakenly declaring packets lost that ultimately arrived
at the receiver. Therefore, being as aggressive as the requirements
given in the previous section allow in any particular situation may
not be the best course of action because detecting loss, even if
falsely, carries a requirement to invoke a congestion response
that will ultimately reduce the transmission rate.
While the tradeoff between responsiveness and correctness seems
fundamental, the tradeoff can be made less relevant if the sender can
detect and recover from mistaken loss detection. Several mechanisms have
been proposed for this purpose, such as Eifel , Forward RTO-Recovery (F-RTO) , and Duplicate Selective Acknowledgement (DSACK) . Using such
mechanisms may allow a data originator to tip towards being more responsive
without incurring (as much of) the attendant costs of mistakenly declaring
packets to be lost.
Also, note that, in addition to the experiments discussed in ,
the Linux TCP implementation has been using various non-standard RTO
mechanisms for many years seemingly without large-scale problems
(e.g., using different EWMA gains than specified in ).
Further, a number of TCP implementations use a steady-state minimum
RTO that is less than the 1 second specified in . While
the implication of these deviations from the standard may be more
spurious retransmits (per ), we are
aware of no large-scale
network safety issues caused by this change to the minimum RTO.
This informs the guidelines in the last section (e.g., there is no
minimum RTO specified).
Finally, we note that while allowing implementations to be more
aggressive could in fact increase the number of needless
retransmissions, the above requirements fail safely in that they
insist on exponential backoff and a transmission rate reduction.
Therefore, providing implementers more latitude than they have
traditionally been given in IETF specifications of RTO mechanisms
does not somehow open the flood gates to aggressive behavior. Since
there is a downside to being aggressive, the incentives for proper
behavior are retained in the mechanism.Security Considerations
This document does not alter the security properties of time-based
loss detection mechanisms. See
for a discussion of these
within the context of TCP.IANA Considerations
This document has no IANA actions.ReferencesNormative ReferencesKey words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Ambiguity of Uppercase vs Lowercase in RFC 2119 Key WordsRFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.Informative ReferencesOn Estimating End-to-End Network Path PropertiesProceedings of the ACM SIGCOMM Technical SymposiumThe RACK-TLP loss detection algorithm for TCPGoogle, IncGoogle, IncGoogle, IncGoogle, Inc This document presents the RACK-TLP loss detection algorithm for TCP.
RACK-TLP uses per-segment transmit timestamps and selective
acknowledgements (SACK) and has two parts: RACK ("Recent
ACKnowledgment") starts fast recovery quickly using time-based
inferences derived from ACK feedback. TLP ("Tail Loss Probe")
leverages RACK and sends a probe packet to trigger ACK feedback to
avoid retransmission timeout (RTO) events. Compared to the widely
used DUPACK threshold approach, RACK-TLP detects losses more
efficiently when there are application-limited flights of data, lost
retransmissions, or data packet reordering events. It is intended to
be an alternative to the DUPACK threshold approach.
Work in ProgressTail Loss Probe (TLP): An Algorithm for Fast Recovery of Tail Losses Retransmission timeouts are detrimental to application latency,
especially for short transfers such as Web transactions where
timeouts can often take longer than all of the rest of a transaction.
The primary cause of retransmission timeouts are lost segments at the
tail of transactions. This document describes an experimental
algorithm for TCP to quickly recover lost segments at the end of
transactions or when an entire window of data or acknowledgments are
lost. Tail Loss Probe (TLP) is a sender-only algorithm that allows
the transport to recover tail losses through fast recovery as opposed
to lengthy retransmission timeouts. If a connection is not receiving
any acknowledgments for a certain period of time, TLP transmits the
last unacknowledged segment (loss probe). In the event of a tail
loss in the original transmissions, the acknowledgment from the loss
probe triggers SACK/FACK based fast recovery. TLP effectively avoids
long timeouts and thereby improves TCP performance.
Work in ProgressQUIC Loss Detection and Congestion ControlWork in ProgressCongestion avoidance and controlACM SIGCOMMImproving Round-Trip Time Estimates in Reliable Transport ProtocolsSIGCOMM 87Domain names - concepts and facilitiesThis RFC is the revised basic definition of The Domain Name System. It obsoletes RFC-882. This memo describes the domain style names and their used for host address look up and electronic mail forwarding. It discusses the clients and servers in the domain name system and the protocol used between them.Domain names - implementation and specificationThis RFC is the revised specification of the protocol and format used in the implementation of the Domain Name System. It obsoletes RFC-883. This memo documents the details of the domain name client - server communication.TCP Selective Acknowledgment OptionsThis memo proposes an implementation of SACK and discusses its performance and related issues. [STANDARDS-TRACK]TCP Control Block InterdependenceThis memo makes the case for interdependent TCP control blocks, where part of the TCP state is shared among similar concurrent connections, or across similar connection instances. TCP state includes a combination of parameters, such as connection state, current round-trip time estimates, congestion control information, and process information. This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind.An Extension to the Selective Acknowledgement (SACK) Option for TCPThis note defines an extension of the Selective Acknowledgement (SACK) Option for TCP. [STANDARDS-TRACK]The Congestion ManagerThis document describes the Congestion Manager (CM), an end-system module that enables an ensemble of multiple concurrent streams from a sender destined to the same receiver and sharing the same congestion properties to perform proper congestion avoidance and control, and allows applications to easily adapt to network congestion. [STANDARDS-TRACK]SIP: Session Initiation ProtocolThis document describes Session Initiation Protocol (SIP), an application-layer control (signaling) protocol for creating, modifying, and terminating sessions with one or more participants. These sessions include Internet telephone calls, multimedia distribution, and multimedia conferences. [STANDARDS-TRACK]The Eifel Detection Algorithm for TCPThe Eifel detection algorithm allows a TCP sender to detect a posteriori whether it has entered loss recovery unnecessarily. It requires that the TCP Timestamps option defined in RFC 1323 be enabled for a connection. The Eifel detection algorithm makes use of the fact that the TCP Timestamps option eliminates the retransmission ambiguity in TCP. Based on the timestamp of the first acceptable ACK that arrives during loss recovery, it decides whether loss recovery was entered unnecessarily. The Eifel detection algorithm provides a basis for future TCP enhancements. This includes response algorithms to back out of loss recovery by restoring a TCP sender's congestion control state. This memo defines an Experimental Protocol for the Internet community.Using TCP Duplicate Selective Acknowledgement (DSACKs) and Stream Control Transmission Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs) to Detect Spurious RetransmissionsTCP and Stream Control Transmission Protocol (SCTP) provide notification of duplicate segment receipt through Duplicate Selective Acknowledgement (DSACKs) and Duplicate Transmission Sequence Number (TSN) notification, respectively. This document presents conservative methods of using this information to identify unnecessary retransmissions for various applications. This memo defines an Experimental Protocol for the Internet community.Stream Control Transmission ProtocolThis document obsoletes RFC 2960 and RFC 3309. It describes the Stream Control Transmission Protocol (SCTP). SCTP is designed to transport Public Switched Telephone Network (PSTN) signaling messages over IP networks, but is capable of broader applications.SCTP is a reliable transport protocol operating on top of a connectionless packet network such as IP. It offers the following services to its users:-- acknowledged error-free non-duplicated transfer of user data,-- data fragmentation to conform to discovered path MTU size,-- sequenced delivery of user messages within multiple streams, with an option for order-of-arrival delivery of individual user messages,-- optional bundling of multiple user messages into a single SCTP packet, and-- network-level fault tolerance through supporting of multi-homing at either or both ends of an association. The design of SCTP includes appropriate congestion avoidance behavior and resistance to flooding and masquerade attacks. [STANDARDS-TRACK]TCP Congestion ControlThis document defines TCP's four intertwined congestion control algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery. In addition, the document specifies how TCP should begin transmission after a relatively long idle period, as well as discussing various acknowledgment generation methods. This document obsoletes RFC 2581. [STANDARDS-TRACK]Forward RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious Retransmission Timeouts with TCPThe purpose of this document is to move the F-RTO (Forward RTO-Recovery) functionality for TCP in RFC 4138 from Experimental to Standards Track status. The F-RTO support for Stream Control Transmission Protocol (SCTP) in RFC 4138 remains with Experimental status. See Appendix B for the differences between this document and RFC 4138.Spurious retransmission timeouts cause suboptimal TCP performance because they often result in unnecessary retransmission of the last window of data. This document describes the F-RTO detection algorithm for detecting spurious TCP retransmission timeouts. F-RTO is a TCP sender-only algorithm that does not require any TCP options to operate. After retransmitting the first unacknowledged segment triggered by a timeout, the F-RTO algorithm of the TCP sender monitors the incoming acknowledgments to determine whether the timeout was spurious. It then decides whether to send new segments or retransmit unacknowledged segments. The algorithm effectively helps to avoid additional unnecessary retransmissions and thereby improves TCP performance in the case of a spurious timeout. [STANDARDS-TRACK]NACK-Oriented Reliable Multicast (NORM) Transport ProtocolThis document describes the messages and procedures of the Negative- ACKnowledgment (NACK) Oriented Reliable Multicast (NORM) protocol. This protocol can provide end-to-end reliable transport of bulk data objects or streams over generic IP multicast routing and forwarding services. NORM uses a selective, negative acknowledgment mechanism for transport reliability and offers additional protocol mechanisms to allow for operation with minimal a priori coordination among senders and receivers. A congestion control scheme is specified to allow the NORM protocol to fairly share available network bandwidth with other transport protocols such as Transmission Control Protocol (TCP). It is capable of operating with both reciprocal multicast routing among senders and receivers and with asymmetric connectivity (possibly a unicast return path) between the senders and receivers. The protocol offers a number of features to allow different types of applications or possibly other higher-level transport protocols to utilize its service in different ways. The protocol leverages the use of FEC-based (forward error correction) repair and other IETF Reliable Multicast Transport (RMT) building blocks in its design. This document obsoletes RFC 3940. [STANDARDS-TRACK]Architectural Guidelines for Multipath TCP DevelopmentHosts are often connected by multiple paths, but TCP restricts communications to a single path per transport connection. Resource usage within the network would be more efficient were these multiple paths able to be used concurrently. This should enhance user experience through improved resilience to network failure and higher throughput.This document outlines architectural guidelines for the development of a Multipath Transport Protocol, with references to how these architectural components come together in the development of a Multipath TCP (MPTCP). This document lists certain high-level design decisions that provide foundations for the design of the MPTCP protocol, based upon these architectural requirements. This document is not an Internet Standards Track specification; it is published for informational purposes.Computing TCP's Retransmission TimerThis document defines the standard algorithm that Transmission Control Protocol (TCP) senders are required to use to compute and manage their retransmission timer. It expands on the discussion in Section 4.2.3.1 of RFC 1122 and upgrades the requirement of supporting the algorithm from a SHOULD to a MUST. This document obsoletes RFC 2988. [STANDARDS-TRACK]A Conservative Loss Recovery Algorithm Based on Selective Acknowledgment (SACK) for TCPThis document presents a conservative loss recovery algorithm for TCP that is based on the use of the selective acknowledgment (SACK) TCP option. The algorithm presented in this document conforms to the spirit of the current congestion control specification (RFC 5681), but allows TCP senders to recover more effectively when multiple segments are lost from a single flight of data. This document obsoletes RFC 3517 and describes changes from it. [STANDARDS-TRACK]TCP Extensions for High PerformanceThis document specifies a set of TCP extensions to improve performance over paths with a large bandwidth * delay product and to provide reliable operation over very high-speed paths. It defines the TCP Window Scale (WS) option and the TCP Timestamps (TS) option and their semantics. The Window Scale option is used to support larger receive windows, while the Timestamps option can be used for at least two distinct mechanisms, Protection Against Wrapped Sequences (PAWS) and Round-Trip Time Measurement (RTTM), that are also described herein.This document obsoletes RFC 1323 and describes changes from it.Acknowledgments
This document benefits from years of discussions with , , , , , and the members of the TCPM and TCPIMPL Working
Groups. , , , , , , , , , , , , and
provided useful comments on
previous draft versions of this document.Author's AddressInternational Computer Science Institute2150 Shattuck Ave., Suite 1100BerkeleyCAUnited States of America94704mallman@icir.orghttps://www.icir.org/mallman