idnits 2.17.1 draft-schuetz-tcpm-tcp-rlci-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 26. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1401. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1412. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1419. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1425. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- The document has an RFC 3978 Section 5.2(a) Derivative Works Limitation clause. == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 1283 has weird spacing: '...mediate sched...' == Line 1284 has weird spacing: '...ransmit retr...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 22, 2008) is 5907 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-09) exists of draft-narten-iana-considerations-rfc2434bis-08 ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) ** Obsolete normative reference: RFC 1981 (Obsoleted by RFC 8201) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) == Outdated reference: A later version (-11) exists of draft-ietf-tcpm-tcp-uto-08 -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 3344 (Obsoleted by RFC 5944) -- Obsolete informational reference (is this intentional?): RFC 3775 (Obsoleted by RFC 6275) -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) Summary: 6 errors (**), 0 flaws (~~), 5 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCPM Working Group S. Schuetz 3 Internet-Draft NEC 4 Intended status: Experimental N. Koutsianas 5 Expires: August 25, 2008 L. Eggert 6 Nokia 7 W. Eddy 8 Verizon 9 Y. Swami 10 Nokia 11 K. Le 12 NSN 13 February 22, 2008 15 TCP Response to Lower-Layer Connectivity-Change Indications 16 draft-schuetz-tcpm-tcp-rlci-03 18 Status of this Memo 20 By submitting this Internet-Draft, each author represents that any 21 applicable patent or other IPR claims of which he or she is aware 22 have been or will be disclosed, and any of which he or she becomes 23 aware will be disclosed, in accordance with Section 6 of BCP 79. 24 This document may not be modified, and derivative works of it may not 25 be created, except to publish it as an RFC and to translate it into 26 languages other than English. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt. 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 This Internet-Draft will expire on August 25, 2008. 46 Copyright Notice 48 Copyright (C) The IETF Trust (2008). 50 Abstract 52 When the path characteristics between two hosts change abruptly, TCP 53 can experience significant delays before resuming transmission in an 54 efficient manner or TCP can behave unfairly to competing traffic. 55 This document describes TCP extensions that improve transmission 56 behavior in response to advisory, lower-layer connectivity-change 57 indications. The proposed TCP extensions modify the local behavior 58 of TCP and introduce a new TCP option to signal locally received 59 connectivity-change indications to remote peers. Performance gains 60 result from a more efficient transmission behavior and there is no 61 difference in aggressiveness in comparison to a newly-started 62 connection. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 3. Motivation and Overview . . . . . . . . . . . . . . . . . . . 4 69 4. Connectivity-Change Indications . . . . . . . . . . . . . . . 6 70 5. TCP Response to Connectivity-Change Indications (CCIs) . . . . 7 71 5.1. Connectivity-Change Indication (CCI) TCP Option . . . . . 9 72 5.2. Generation and Processing of Connectivity-Change 73 Indication TCP Options . . . . . . . . . . . . . . . . . . 11 74 5.3. Re-Probing Path Characteristics . . . . . . . . . . . . . 15 75 5.4. Speculative Retransmission . . . . . . . . . . . . . . . . 16 76 6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 16 77 6.1. Triggered Segment Transmission during Steady-State . . . . 17 78 6.2. Impact of Packet Loss . . . . . . . . . . . . . . . . . . 17 79 6.3. Use of Limited Transmit with RLCI . . . . . . . . . . . . 18 80 6.4. Simultaneous Processing of Connectivity-Change 81 Indications . . . . . . . . . . . . . . . . . . . . . . . 19 82 7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 83 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 84 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 85 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 86 10.1. Normative References . . . . . . . . . . . . . . . . . . . 20 87 10.2. Informative References . . . . . . . . . . . . . . . . . . 21 88 Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . . 89 Appendix A. Background: Classification of Connectivity 90 Disruptions . . . . . . . . . . . . . . . . . . . . . 23 91 A.1. Short Connectivity Disruptions . . . . . . . . . . . . . . 25 92 A.2. Long Connectivity Disruptions . . . . . . . . . . . . . . 27 93 Appendix B. Document Revision History . . . . . . . . . . . . . . 29 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 95 Intellectual Property and Copyright Statements . . . . . . . . . . 32 97 1. Introduction 99 The Transmission Control Protocol (TCP) [RFC0793] generally assumes 100 that the end-to-end path between two hosts has characteristics that 101 are relatively stable over the lifetime of a connection. Although 102 TCP's congestion control algorithms [RFC2581] can adapt to changes to 103 the path characteristics after several round-trip times, they fail to 104 support efficient operation in the few round-trip times immediately 105 after a significant path change. This is due to the granularity of 106 TCP's sampling mechanisms. Significant changes to path connectivity 107 include loss or reestablishment of connectivity, and drastic, abrupt 108 changes in round-trip time (RTT) or available bandwidth. 109 Connectivity changes that occur on such short time-scales are 110 becoming more common, due to host mobility or intermittent network 111 attachment. 113 This document describes a set of complementary TCP extensions that 114 improve behavior when transmitting over paths whose characteristics 115 can change on short time-scales. TCP implementations that support 116 these extensions respond to receiving generic, link-technology- 117 independent, per-connection connectivity-change indications from 118 lower layers. A connectivity-change indication signals that the 119 characteristics of the end-to-end path between the local node and its 120 peer have changed in some undefined way. The response mechanisms 121 proposed for TCP act on this information in a conservative fashion. 122 The specific response depends on the current state of a connection 123 when a connectivity-change indication is received. 125 It is important to note that this addition of response mechanisms to 126 lower-layer information is following an established precedent. TCP 127 and other transport protocols already react to information and 128 signals from lower layers; the proposed connectivity-change 129 indications thus extend an established interface between layers in 130 the protocol stack. TCP measures the end-to-end path to implicitly 131 derive network-layer information. TCP also directly reacts to 132 network-layer signals delivered via ICMP, for example, "Port 133 Unreachable" or the now-deprecated "Source Quench" [RFC1122]. 134 Explicit Congestion Notification (ECN) [RFC3168] and Quick-Start 135 [RFC4782] are other sources of network-layer information for which 136 response mechanisms for TCP have been defined. Connectivity-change 137 indications are yet another source of lower-layer information that 138 TCP can use to improve its operation. 140 A second important point to note is that the TCP response mechanisms 141 to connectivity-change indications are purely optional efficiency 142 improvements. In the absence of connectivity-change indications, a 143 TCP that implements these changes behaves identically to an 144 unmodified TCP. When lower layers provide connectivity-change 145 indications that trigger the response mechanisms, they enhance TCP 146 operation based on the explicit lower-layer information that is 147 signaled. These response mechanisms do not increase the 148 aggressiveness of TCP. 150 Note that the IAB has recently described architectural issues of 151 "link indications" [RFC4907]. The authors feel that this term is not 152 quite accurate in this environment, because transport mechanisms 153 should remain link-technology-agnostic. However, transport protocols 154 have always acted on network-layer information and signals, such as 155 measured path characteristics or ICMP-signaled conditions. Because 156 of the growing proliferation of shim layers between the traditional 157 network and transport layers, this document uses the term "lower- 158 layer indication" to remain independent of specific network or shim 159 layers. 161 Note that it is currently an open question as to whether additional 162 lower-layer indications can provide further information to transport 163 protocols. Also, this document only describes response mechanisms 164 for TCP, although other transport protocols may benefit from similar 165 response mechanisms to react to connectivity-change indications. 167 2. Terminology 169 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 170 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 171 document are to be interpreted as described in [RFC2119]. 173 The following abbreviations are used throughout the document: 175 +------+---------------------------------------------------------+ 176 | CCI | Connectivity-Change Indication | 177 | RLCI | Response to Lower-layer Connectivity-change Indications | 178 +------+---------------------------------------------------------+ 180 Table 1: Abbreviations 182 3. Motivation and Overview 184 Several proposed network-layer extensions support host mobility, 185 including Mobile IPv4 [RFC3344], Mobile IPv6 [RFC3775] and HIP 186 [I-D.ietf-hip-mm]. Typically, they shield transport-layer protocols 187 from mobility events and enable them to sustain established 188 connections across mobility events. However, the path 189 characteristics that established connections experience after a 190 mobility event may have changed drastically and on short time-scales. 192 Congestion control, RTT and path-MTU state gathered over an old path 193 before the move generally have no meaning for the new path. Because 194 TCP uses stale information when resuming transmission over the new 195 path, it can be either too aggressive or highly inefficient. Similar 196 conditions may be found when fail-overs occur for multihomed hosts 197 through the shim6 protocol. Some background on the types of 198 scenarios that the technology described in this document is designed 199 to work within is found in Appendix A. 201 TCP already forces a slow-start restart in some cases where the 202 network state becomes unknown, such as after an idle period or heavy 203 losses. A first part of the response specified in this document 204 involves a similar return to initial slow-start state in response to 205 connectivity-change indications that are received while a connection 206 is transmitting in steady-state. Note that this behavior is more 207 conservative than the standard TCP response or lack of response. 208 Some performance gains with the proposed mechanisms are due to either 209 avoiding overloading the new path, which typically incurs an RTO, or 210 using slow-start to quickly detect new capacity far above the point 211 where steady-state had previously been near. 213 A second response component improves TCP operation in the presence of 214 temporary connectivity disruptions. These disruptions can occur 215 independently of mobility events and, for example, may be due to 216 insufficient wireless access coverage or nomadic computer use. 217 Connectivity disruptions can severely decrease TCP performance. The 218 main reason for this decrease is TCP's retransmission behavior after 219 a connectivity disruption [SCHUETZ]. TCP uses periodic 220 retransmission attempts in exponentially increasing intervals, which 221 can unnecessarily delay retransmissions after connectivity returns. 222 In the extreme case, TCP connections can even abort, if the 223 disruption is longer than the TCP "user timeout". (Connection aborts 224 are out of scope for this document but can be prevented by the TCP 225 User Timeout Option [I-D.ietf-tcpm-tcp-uto].) 227 This second response action executes when receiving a connectivity- 228 change indication while a connection is stalled in exponential back- 229 off. It improves TCP retransmission behavior after connectivity is 230 restored through an immediate speculative retransmission attempt 231 [footnote-1]. Similar to the first response component, the second 232 one also increases TCP performance through a more intelligent 233 transmission behavior that uses periods of connectivity more 234 efficiently. In comparison to startup of a new connection, it does 235 not cause significant amounts of additional traffic and it does not 236 change TCP's congestion control algorithms. 238 Finally, this draft specifies a third response component, which is a 239 new TCP option that notifies the connection's remote peer of a 240 connectivity-change event detected locally. This is useful because 241 connectivity-change indications typically require appropriate 242 responses at both ends of a connection, but may only be received or 243 detected by one end. The other parts of the response to a 244 connectivity-change indication are independent of the indication's 245 source (locally notified or remotely signaled) and depend only on the 246 specific indication and the state of the connection for which it was 247 received. 249 4. Connectivity-Change Indications 251 The focus of this document is on specifying TCP response mechanisms 252 to lower-layer connectivity-change indications. This section briefly 253 describes how different network- and shim-layer mechanisms underneath 254 the transport layer may provide these connectivity-change indications 255 to TCP. This section is included for clarification only; details on 256 connectivity indication sources are out of scope of this document. 258 When lower layers detect a connectivity-change event, they generate 259 corresponding connectivity-change indications. Lower-layer events 260 that could trigger such an indication include (but are not limited 261 to): 263 o the IP address of the local outbound interface used for a given 264 connection has changed, e.g., due to DHCP [RFC2131] or IPv6 router 265 advertisements [RFC2460]; 267 o link-layer connectivity of the local outbound interface used for a 268 given connection has changed, e.g., link-layer "link up" event 269 [RFC4957]; 271 o the local outbound interface used for a given connection has 272 changed, due to routing changes or link-layer connectivity changes 273 at other interfaces (including tunnel establishment or teardown, 274 e.g., in response to IKE events [RFC4306]); 276 o a Mobile IP binding update has completed [RFC3775]; 278 o a HIP readdressing update has completed [I-D.ietf-hip-mm]; 280 o a path-change signal from the network has arrived (possible in 281 theory, depends on network capabilities); 283 o other notifications as defined by the IETF's Detecting Network 284 Attachment (DNA) working group have occurred [RFC4957]. 286 Note that the list above only describes some potential sources for 287 connectivity-change events. Other sources exist, but the details on 288 when to generate such events are out of the scope of this document, 289 which focuses on the TCP response mechanisms when such events are 290 received. 292 5. TCP Response to Connectivity-Change Indications (CCIs) 294 A TCP connection can receive a connectivity-change indication (CCI) 295 either from its local stack ("local CCI") or through a new 296 "connectivity-change indication TCP option" from its peer ("remote 297 CCI"). Section 5.1 specifies this new TCP option. In either case, 298 upon reception of a CCI, the TCP RLCI (Response to Lower-layer 299 Connectivity-change Indications) mechanisms defined in this document 300 immediately re-probe path characteristics. They do this by either 301 performing a speculative retransmission or by sending a single 302 segment of new data or a pure ACK, depending on whether the 303 connection is currently stalled in exponential back-off or 304 transmitting in steady-state, respectively. A connection is "stalled 305 in exponential back-off", if at least one segment was retransmitted 306 due to a RTO expiration but has not been ACK'ed yet. 308 The remainder of this section first defines the format of the new CCI 309 TCP option in Section 5.1 and its processing in Section 5.2. After 310 that, the two TCP response mechanisms triggered by receiving CCIs - 311 re-probing path characteristics and speculative retransmission - are 312 described in Section 5.3 and Section 5.4. 314 The TCP RLCI mechanisms defined in this document depend on the TCP 315 Timestamps option (TSopt) [RFC1323]. Consequently, it is REQUIRED 316 that an end host that wishes to use the RLCI mechanisms for a TCP 317 connection negotiate the use of TCP Timestamps options with its peer. 318 If this negotiation fails, a host MUST NOT use the RLCI mechanisms 319 for a connection. TCP Timestamps options are needed by the RLCI 320 mechanisms during the following operations: 322 o To re-probe the path characteristics after a connectivity-change 323 indication. A host uses the TS Echo Reply (TSecr) field of a TCP 324 Timestamps option to distinguish whether incoming ACKs are for 325 segments that have been transmitted before or after CCI. 327 o To identify a new remote CCI. A host uses the TS Value (TSval) 328 field of an incoming TCP Timestamps option to distinguish a new 329 remote CCI from the delayed reception of an old one. As a result, 330 last remote CCI is defined as the one received with the highest TS 331 Value. 333 Section 5.2 and Section 5.3 give more details about how the RLCI 334 mechanisms use TCP Timestamps options. 336 An implementation of the RLCI mechanisms defined in this document 337 maintains nine new state variables per TCP connection. [footnote-2] 339 LOCAL_CCI 340 It is a 1-bit counter, having an initial value of 0. It is used 341 for distinguishing the existence of a new local CCI. It changes 342 its value every time a new local CCI received from the local stack 343 starts being processed. 345 REMOTE_CCI 346 It holds a copy of the last CCI value advertised by the peer 347 through a CCI TCP option. This is a 1-bit counter initialized to 348 0 and gets updated in response to remote CCIs according to the 349 rules defined in Section 5.2. 351 LOCAL_CCI_STATUS 352 It holds the status of the processing of local CCIs. It can have 353 three possible values: LOCAL_CCI_IDLE (0), LOCAL_CCI_NEW (1), 354 LOCAL_CCI_ECHO_ACK (2). The initial value is LOCAL_CCI_IDLE. 356 REMOTE_CCI_STATUS 357 It holds the status of the processing of the last remote CCI 358 advertised by the peer through a CCI TCP option. It can have two 359 possible values: REMOTE_CCI_IDLE (0), REMOTE_CCI_ECHO (1). The 360 initial value is REMOTE_CCI_IDLE. 362 LAST_CCI_TIME 363 It holds the local time when the last CCI (either local or remote) 364 was received. It is updated every time either LOCAL_CCI or 365 REMOTE_CCI is modified. 367 REMOTE_CCI_PEER_TIME 368 This variable is used in order to distinguish new remote CCIs from 369 the retransmissions of the past ones. It holds the TS Value 370 (TSval) of the Timestamps option of the segment advertising the 371 last remote CCI. It is initialized when receiving the first 372 segment from the peer and it is updated every time REMOTE_CCI is 373 modified. 375 LOCAL_CCI_PEER_ECHO_TIME 376 This variable is used in order to distinguish the echo of a new 377 local CCI from delayed retransmissions of echoes of older local 378 CCIs. It holds the TS Value (TSval) of the Timestamps option of 379 the segment that echoed the last local CCI. It is initialized 380 when receiving the first segment from the peer and it is updated 381 every time LOCAL_CCI_STATUS changes from LOCAL_CCI_NEW to 382 LOCAL_CCI_ECHO_ACK. 384 CCI_SNDMAX 385 Retains the highest sequence number transmitted when the most 386 recent CCI (either local or remote) was received. 388 CCI_CONTROLLED_CWND 389 It is a Boolean variable that sets an additional condition 390 controlling the increment of TCPs congestion window (CWND). 391 Having an initial value of false, it is updated according to the 392 rules defined in Section 5.2. 394 5.1. Connectivity-Change Indication (CCI) TCP Option 396 Connectivity-change indications (CCIs) are generally asymmetric, 397 i.e., they may occur or be detected by one end but not the other. 398 The basic idea behind the CCI option is to signal the occurrence of 399 local CCIs to the other end, in order to allow also the other end to 400 respond appropriately. Note that this assumes that paths will 401 generally be symmetric, meaning that a CCI received by one end for 402 its path to the other end will imply that the characteristics of the 403 reverse path have changed, too. 405 1 2 406 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 407 +---------------+---------------+-----+-+-+---+-+ 408 | | | R | | | |E| 409 | Kind = X | Length = 3 | E |C|E| C |C| 410 | | | S | |C| S |S| 411 +---------------+---------------+-----+-+-+---+-+ 413 Figure 1: Format of the connectivity-change indication TCP option. 415 Figure 1 shows the format of the CCI option. It contains these 416 fields: 418 Kind (8 bits) 419 The TCP option number X [RFC0793] allocated by IANA upon 420 publication of this document (see Section 8). 422 Length (8 bits) 423 Length of the TCP option in octets [RFC0793]; its value MUST be 3. 425 RES (3 bits) 426 Reserved bits. The sender SHOULD set these to zero and the 427 receiver MUST ignore them. 429 C (1 bit) 430 Current value of LOCAL_CCI of the end sending the option. 432 EC (1 bit) 433 Echoed value of C, i.e., the current value of REMOTE_CCI of the 434 end sending the option. 436 CS (2 bit) 437 Current value of LOCAL_CCI_STATUS of the end sending the option. 439 ECS (1 bit) 440 Current value of REMOTE_CCI_STATUS of the end sending the option. 442 The CCI option contains two single-bit fields (C and EC) used to 443 distinguish new CCIs from delayed retransmissions of past ones. It 444 also contains some flags representing the status of each CCI 445 processing. These flags are used for a 3-way handshake ensuring that 446 both parties have been informed of a new CCI. At the beginning of a 447 connection, LOCAL_CCI and REMOTE_CCI MUST be set to 0. 448 LOCAL_CCI_STATUS and REMOTE_CCI_STATUS MUST be set to LOCAL_CCI_IDLE 449 and REMOTE_CCI_IDLE, respectively. 451 A host actively opening a connection and wishing to use the CCI 452 option for that connection MUST include a CCI option in its SYN 453 segment with C := 0, CS := LOCAL_CCI_IDLE, EC := 0 and ECS := 454 REMOTE_CCI_IDLE in order to advertise support for the TCP CCI option. 455 A host receiving a SYN segment MUST NOT include a CCI option in its 456 SYN-ACK or any subsequent segment, unless it has received a CCI 457 option in the corresponding SYN. In case a host has received a CCI 458 option in the SYN segment, it MUST echo that CCI option in its SYN- 459 ACK segment, i.e., it MUST set C := 0, CS := LOCAL_CCI_IDLE, EC := 0 460 and ECS := REMOTE_CCI_IDLE. A host MUST NOT process any following 461 CCI options unless one was included in both the SYN and SYN-ACK and 462 both peers have enabled TCP Timestamps for the connection. 463 Section 5.2.1 and Section 5.2.2 describe the processing rules in 464 detail. 466 A host MUST send a CCI option in all outgoing segments whenever 467 LOCAL_CCI_STATUS is not LOCAL_CCI_IDLE or REMOTE_CCI_STATUS is not 468 REMOTE_CCI_IDLE (or both). A host MUST NOT send a CCI option when 469 LOCAL_CCI_STATUS is LOCAL_CCI_IDLE and REMOTE_CCI_STATUS is 470 REMOTE_CCI_IDLE, i.e., when the host is not currently processing any 471 CCI. The only exceptions to that rule are SYN and SYN-ACK segments. 472 Whenever sending any CCI option, C MUST be set to the current 473 LOCAL_CCI, EC MUST be set to the current REMOTE_CCI, CS MUST be set 474 to LOCAL_CCI_STATUS and ECS MUST be set to REMOTE_CCI_STATUS, 475 respectively. 477 5.2. Generation and Processing of Connectivity-Change Indication TCP 478 Options 480 Processing of a connectivity-change indication can be separated into 481 two parts: 483 1. Processing in "initiator" mode, i.e., when a host receives a 484 local CCI and (reliably) forwards it to the other end through a 485 CCI option. 487 2. Processing in "responder" mode, i.e., when a host that receives a 488 remote CCI in a CCI option from the other end. 490 Section 5.2.1 and Section 5.2.2 describe the state machines at an 491 initiator and a responder, respectively. Note that a single host can 492 be both - initiator and responder - at the same time. This can 493 happen if a local CCI occurs while processing for a remote CCI is 494 ongoing, or vice versa. 496 The following events, conditions and actions are used in the 497 definition of the two state machines: 499 Events: 501 E_LOCAL_CCI 502 Local end received a local CCI. 504 E_REMOTE_CCI 505 Local end received information about a remote CCI, i.e., received 506 a TCP segment that includes a CCI option. 508 E_SEGMENT_SENT 509 Local end sent a TCP segment that includes the CCI option. 511 Conditions: 513 C_NEW_REMOTE_CCI 514 A received CCI option signals a new remote CCI, i.e., C != 515 REMOTE_CCI, CS == LOCAL_CCI_NEW and the TSval of the Timestamps 516 option of the received segment is greater than the current 517 REMOTE_CCI_PEER_TIME (TSval > REMOTE_CCI_PEER_TIME). 519 C_ECHOED_LOCAL_CCI 520 A received CCI option echoes the last local CCI, i.e., EC == 521 LOCAL_CCI, ECS == REMOTE_CCI_ECHO and the TSval of the Timestamps 522 option of the received segment is greater than the current 523 LOCAL_CCI_PEER_ECHO_TIME (TSval > LOCAL_CCI_PEER_ECHO_TIME). 525 C_ECHOED_REMOTE_CCI 526 A received CCI option acknowledges that the peer has received the 527 echo of its last local CCI, i.e., C == REMOTE_CCI, CS == 528 LOCAL_CCI_ECHO_ACK and the TSval of the Timestamps option of the 529 received segment is greater than the current REMOTE_CCI_PEER_TIME 530 (TSval > REMOTE_CCI_PEER_TIME). 532 Actions: 534 A_TGL_LOCAL_CCI 535 Toggle LOCAL_CCI. 537 A_TGL_REMOTE_CCI 538 Toggle REMOTE_CCI. 540 A_REPROBE_PATH 541 TCP discards all congestion control information gathered on the 542 current path, initializes them to the defaults and re-probes path 543 characteristics based only on the segments transmitted after this 544 event, as described in Section 5.3. In other words, 545 CCI_CONTROLLED_CWND := 1, LAST_CCI_TIME := current local time, 546 CCI_SNDMAX := highest sequence number transmitted so far and the 547 congestion control state (CWND and SS_THRESH), round-trip time 548 measurement (RTTM) state and RTO timer are reset to the initial 549 values for a new connection. Additionally, if the connection is 550 stalled in exponential back-off, TCP MUST act as if RTO had 551 expired and start the speculative retransmission procedure 552 described in Section 5.4. 554 A_FORCE_SEND 555 Force transmission of a segment that MUST include a CCI option, in 556 order to inform the other peer about the local CCI. If the 557 connection is stalled in exponential back-off, this is taken care 558 of by the speculative retransmission procedure described in 559 Section 5.4. If the connection is in steady-state and there is 560 new data to be sent, TCP MUST immediately send a single segment of 561 new data including a CCI option. If there is no new data to be 562 sent, TCP MUST immediately send a pure ACK including a CCI option. 564 A_UPD_CCI_PEER_TIME 565 Set REMOTE_CCI_PEER_TIME to the TSval value of the TCP Timestamps 566 option of the received segment. 568 A_UPD_CCI_PEER_E_TIME 569 Set LOCAL_CCI_PEER_ECHO_TIME to the TSval value of the TCP 570 Timestamps option of the received segment. 572 5.2.1. Initiator Mode Processing 574 This section describes the initiator mode processing of a TCP host 575 implementing RLCI. In initiator mode, a host signals the occurrence 576 of a local CCI to its peer, until the peer echoes reception of that 577 CCI. After receiving the echo, the host needs to acknowledge the 578 echo reception, resulting in a 3-way handshake. Figure 2 shows the 579 corresponding state machine. 581 At the beginning of a connection, i.e., before the first local CCI 582 occurs, LOCAL_CCI is 0 and LOCAL_CCI_STATUS is LOCAL_CCI_IDLE. This 583 remains the case until TCP receives a local CCI (E_LOCAL_CCI). 585 When that happens, TCP toggles LOCAL_CCI (A_TGL_LOCAL_CCI), sets 586 LOCAL_CCI_STATUS := LOCAL_CCI_NEW, starts re-probing the new path 587 (A_REPROBE_PATH) and forces a segment to be sent to the peer 588 (A_FORCE_SEND). 590 Note that all subsequently transmitted segments MUST contain a CCI 591 option until LOCAL_CCI_STATUS becomes LOCAL_CCI_IDLE. After the host 592 receives the echo of the local CCI (C_ECHOED_LOCAL_CCI), it updates 593 LOCAL_CCI_PEER_ECHO_TIME (A_UPD_CCI_PEER_E_TIME) and sets 594 LOCAL_CCI_STATUS := LOCAL_CCI_ECHO_ACK. The initiator remains in 595 this state until it can send a segment with the CCI option 596 (E_SEGMENT_SENT) that acknowledges reception of the CCI echo. At 597 that time, it sets LOCAL_CCI_STATUS := LOCAL_CCI_IDLE. 599 The transition from LOCAL_CCI_IDLE to LOCAL_CCI_ECHO_ACK occurs if a 600 segment acknowledging the reception of a CCI echo is lost, and the 601 initiator retransmits the echo acknowledgment. 603 When a local CCI occurs (E_LOCAL_CCI) while LOCAL_CCI_STATUS != 604 LOCAL_CCI_IDLE, the host MUST ignore it and MUST NOT alter LOCAL_CCI, 605 because it is already processing another local CCI. 607 E_LOCAL_CCI => 608 A_TGL_LOCAL_CCI E_REMOTE_CCI 609 A_REPROBE_PATH C_ECHOED_LOCAL_CCI=> 610 A_FORCE_SEND A_UPD_CCI_PEER_E_TIME 611 +----------------+ +----------------+ 612 | | | | 613 | | | | 614 | | | | 615 | V | V 616 +----------------+ +----------------+ +----------------+ 617 | | | | | | 618 |LOCAL_CCI_STATUS| |LOCAL_CCI_STATUS| |LOCAL_CCI_STATUS| 619 | == | | == | | == | 620 |LOCAL_CCI_IDLE | |LOCAL_CCI_NEW | |LOCAL_CCI_ECHO_ | 621 | | | | |ACK | 622 +----------------+ +----------------+ +----------------+ 623 ^ | ^ | 624 | | | | 625 | +-----------------------------------+ | 626 | E_REMOTE_CCI | 627 | C_ECHOED_LOCAL_CCI | 628 | | 629 | | 630 +-----------------------------------------+ 631 E_SEGMENT_SENT 633 Figure 2: State machine for initiator processing. 635 5.2.2. Responder Mode Processing 637 This section describes the responder mode processing of CCIs for a 638 TCP host implementing the CCI option. In responder mode, a host 639 echoes the last received remote CCI to its peer, until it can be sure 640 that the peer correctly received the echo. Figure 3 shows the 641 corresponding state machine. 643 At the beginning of a connection, REMOTE_CCI is 0 and 644 REMOTE_CCI_STATUS is REMOTE_CCI_IDLE, i.e., the local host is not 645 processing any remote CCIs. 647 When TCP receives a segment with a CCI option (E_REMOTE_CCI) 648 signaling a new remote CCI (C_NEW_REMOTE_CCI), it increments 649 REMOTE_CCI (A_TGL_REMOTE_CCI), changes REMOTE_CCI_STATUS to 650 REMOTE_CCI_ECHO, updates REMOTE_CCI_PEER_TIME according to TSval 651 (A_UPD_CCI_PEER_TIME), starts re-probing the new path 652 (A_REPROBE_PATH) and forces a segment to be sent to the peer 653 (A_FORCE_SEND). 655 Note that all subsequently transmitted segments MUST contain a CCI 656 option until REMOTE_CCI_STATUS is again REMOTE_CCI_IDLE. This 657 transition occurs when the peer acknowledges the reception of the CCI 658 echo (C_ECHOED_REMOTE_CCI). 660 E_REMOTE_CCI E_REMOTE_CCI 661 C_NEW_REMOTE_CCI => C_NEW_REMOTE_CCI => 662 A_TGL_REMOTE_CCI A_TGL_REMOTE_CCI 663 A_UPD_CCI_PEER_TIME A_UPD_CCI_PEER_TIME 664 A_REPROBE_PATH A_REPROBE_PATH 665 A_FORCE_SEND A_FORCE_SEND 666 +-----------------+ +-------------+ 667 | | | | 668 | V | | 669 +-----------------+ +-----------------+ | 670 |REMOTE_CCI_STATUS| |REMOTE_CCI_STATUS| | 671 | == | | == | | 672 |REMOTE_CCI_IDLE | |REMOTE_CCI_ECHO | | 673 +-----------------+ +-----------------+ | 674 ^ | ^ | 675 | | | | 676 +-----------------+ +-------------+ 677 E_REMOTE_CCI 678 C_ECHOED_REMOTE_CCI 680 Figure 3: State machine for responder processing. 682 If TCP receives a new remote CCI while REMOTE_CCI_STATUS == 683 REMOTE_CCI_ECHO, this indicates that the acknowledgment of a previous 684 CCI echo may have been lost and that the peer had a new CCI occur. 685 In this case, TCP MUST perform the same actions as if 686 REMOTE_CCI_STATUS == REMOTE_CCI_IDLE. 688 5.3. Re-Probing Path Characteristics 690 When a TCP connection receives a new CCI, it MUST re-probe path 691 characteristics in order to prevent causing congestion by 692 transmitting based on stale path state information. In principle, 693 this is similar to the initial slow-start: The sender MUST NOT 694 transmit more than the default initial window (INIT_WINDOW) of data 695 after a new CCI is received and it MUST reset the congestion control 696 state (CWND and SS_THRESH), round-trip time measurement (RTTM) state 697 and RTO timer, as if this were a new connection [RFC2581][RFC2988]. 699 If Path MTU Discovery (PMTUD) is in use, the PMTUD state MUST also be 700 reset [RFC1191][RFC1981][RFC4821]. 702 One difference to an initial slow-start is that after a CCI, the 703 connection may have segments in flight towards the destination along 704 a previous path. Therefore, after a CCI, TCP MUST ignore any ACKs 705 received for data that was sent before the CCI and it MUST update the 706 congestion window solely based on ACKs for data that was sent after 707 the CCI occurred. 709 The mechanism used for distinguishing ACKs for data sent after a CCI 710 occurred from ACKs for data sent before a CCI occurred uses TCP 711 Timestamps options. When a host receives a new CCI (either local or 712 remote), LAST_CCI_TIME MUST be set to the current local time, 713 CCI_SNDMAX MUST be set to the highest sequence number transmitted so 714 far and CCI_CONTROLLED_CWND MUST be set to true. 716 While CCI_CONTROLLED_CWND == true, TCP MUST update the congestion 717 window based only on inbound ACKs that contain a TS Echo Reply 718 (TSecr) value greater than or equal to LAST_CCI_TIME. Any inbound 719 ACK with a TS Echo Reply (TSecr) value less than LAST_CCI_TIME MUST 720 NOT cause an update to the congestion window, even if it advances the 721 window. If CCI_CONTROLLED_CWND is true and the host receives an ACK 722 with a sequence number greater than or equal to CCI_SNDMAX, 723 CCI_CONTROLLED_CWND MUST be set to false and the congestion control 724 algorithm MUST begin to process all ACKs normally, without checking 725 their Timestamps options. 727 5.4. Speculative Retransmission 729 The basic idea behind the speculative retransmission is to allow TCP 730 to resume stalled connections as soon as it receives an indication 731 that connectivity to previously unreachable peers may have returned. 733 When a TCP connection receives a new CCI - either from the local 734 stack or in a CCI TCP option from the peer - and is currently stalled 735 in exponential back-off, it MUST immediately initiate the standard 736 retransmission procedure, just as if the RTO for the connection had 737 expired. 739 6. Discussion 741 This section discusses some design choices of the RLCI mechanisms 742 that can affect TCP performance under certain circumstances. 744 6.1. Triggered Segment Transmission during Steady-State 746 A TCP stack that implements RLCI mechanisms and receives a local CCI 747 immediately sends a TCP segment (A_FORCE_SEND) in order to inform the 748 other end of the CCI and resets all path information 749 (A_REPROBE_PATH). When TCP is stalled in exponential back-off, this 750 is taken care of by the speculative retransmission procedure that is 751 triggered by the CCI. 753 On the other hand, when TCP is in steady-state, it sends a new 754 segment (A_FORCE_SEND) if there is any new data queued for 755 transmission. As usual, the number of unacknowledged segments is 756 limited by CWND. However, CWND has just been reset to its initial 757 value. This means that there is a possibility that the transmission 758 sends a segment that is outside the current congestion window. 759 Although this behavior may appear to be aggressive, it is in fact as 760 conservative as a newly starting connection, because only a single 761 unacknowledged segment is sent along the path after CCI. 763 6.2. Impact of Packet Loss 765 If a connection is in exponential back-off when a CCI occurs, TCP 766 considers all unacknowledged segments to be lost and the speculative 767 retransmission procedure immediately starts. 769 On the other hand, if the connection is in steady-state when a CCI 770 occurs, TCP considers all unacknowledged segments to still be in 771 flight and continues sending new data. Depending on what caused a 772 CCI, four scenarios are possible that differ in what happens to 773 segments and ACKs in flight: 775 1. All (or at least the vast majority of) segments and ACKs in 776 flight reach their respective destinations, i.e., there are no 777 losses. In this case, TCP acts as if a new connection had 778 started and re-probes the new path. 780 2. Some of the ACKs in flight from the receiver to the sender are 781 lost. In this case, TCP behaves exactly as above, because a 782 cumulative ACK for the new segment sent along the path after the 783 CCI acknowledges all the previous unacknowledged segments. 785 3. Some of the data segments in flight from the sender to the 786 receiver are lost. In this case, the new data segment 787 transmitted after the CCI causes a duplicate ACK. As this 788 duplicate ACK does not cause TCP to send another data segment, 789 the connection stalls and a RTO occurs. After RTO, the standard 790 retransmission procedure takes place with SS_THRESH equal to 791 INITIAL_WINDOW/2 (i.e., the minimum allowed). This disables slow 792 start and causes a severely decreased performance. A possible 793 solution is to execute the speculative retransmission procedure 794 after receiving a CCI even if the connection is in steady-state. 796 4. Some of the data segments and some of the ACKs that are in flight 797 are lost. This case is similar to the previous one. 799 In all these cases, it is also possible that the round-trip time 800 changes significantly after the CCI, reordering data segments and 801 ACKs that are still in flight with ones sent after the CCI. These 802 reorderings appear to TCP as losses, and may result in the connection 803 experiencing one of the above cases even if there was no actual 804 packet loss. 806 6.3. Use of Limited Transmit with RLCI 808 As described in the previous section, when a connection is in steady- 809 state, a connectivity-change indication (CCI) resets all path 810 information of TCP and causes one new data segment to be sent. In 811 case of significant data segment loss before a CCI, the new data 812 segment transmitted after a CCI causes a duplicate ACK. As this 813 duplicate ACK does not trigger TCP to send another data segment, the 814 connection stalls and an RTO occurs. 816 Limited Transmit [RFC3042] can be used in case of packet loss in 817 order to cause the transmission of three duplicate ACKs and trigger 818 the fast retransmission procedure. As it must not cause an amount of 819 outstanding data more than the congestion window plus two segments, 820 it cannot always be used after a CCI due to the initialized CWND. If 821 the connection has more outstanding data than INITIAL_WINDOW plus two 822 segments before a CCI, resetting of CWND to the initial value after 823 CCI causes an amount of outstanding data greater than the new CWND 824 plus two segments and disables Limited Transmit. 826 A modified Limited Transmit algorithm can be used in combination with 827 RLCI: 829 If CCI_CONTROLLED_CWND is true: 830 The Limited Transmit Algorithm as described in [RFC3042] should be 831 followed, but without checking the amount of outstanding data, 832 i.e., if a TCP sender has previously unsent data queued for 833 transmission it should transmit new data upon the arrival of the 834 first two consecutive duplicate ACKs when the receiver's 835 advertised window allows this transmission. 837 If CCI_CONTROLLED_CWND is false: 838 The Limited Transmit Algorithm as described in [RFC3042] should be 839 followed unmodified. 841 When the fast retransmission procedure is triggered by the modified 842 Limited Transmit after a CCI, SS_THRESH is set to INITIAL_WINDOW/2 843 (i.e., the minimum allowed) as CWND before fast retransmission was 844 equal to INITIAL_WINDOW. As a result, slow-start is disabled causing 845 decreased TCP performance. 847 A minor modification can keep SS_THRESH unmodified in the previous 848 case, i.e., if CCI_CONTROLLED_CWND == true and CWND == 849 INITIAL_WINDOW, keep SS_THRESH unmodified (having its initial value) 850 upon the reception of the third duplicate ACK that triggers the fast 851 retransmission procedure. 853 6.4. Simultaneous Processing of Connectivity-Change Indications 855 As mentioned in Section 5.2.1, if a local CCI occurs (E_LOCAL_CCI) 856 while LOCAL_CCI_STATUS != LOCAL_CCI_IDLE, the host MUST ignore it, 857 because it is already processing another local CCI. As a result, 858 only one local CCI at each end can be processed at the same time. 859 Consequently, as every remote CCI at one end is triggered by a local 860 CCI at the other end, only one remote CCI at each end can be 861 processed at the same time. 863 On the other hand, if both hosts receive connectivity-change 864 indications from their local stacks (local CCIs) at almost the same 865 time, there is a possibility of simultaneous processing of local and 866 remote CCIs at both ends. In that case, path re-probing is triggered 867 twice at each end in a very short time that can be lower than RTT. 868 As this does not improve TCP performance, it can be avoided by 869 triggering the A_REPROBE_PATH action only if CCI_CONTROLLED_CWND == 870 false. 872 7. Security Considerations 874 The only foreseen security considerations with the techniques 875 presented in this document result from either an attacker's ability 876 to spoof valid TCP segments with CCI options that seemingly indicate 877 connectivity changes, or an attacker's ability to generate bogus CCIs 878 locally. An attacker might produce a stream of such false indicators 879 that could keep a connection in slow-start at the initial window. 880 One possible defense against this type of attack is to rate-limit the 881 response to CCIs (whether local or remote). This is also probably 882 less serious than other attacks such an empowered adversary could 883 perform, like resetting the connection or injecting data. A similar 884 effect could be achieved without the new CCI option by forging 885 duplicate ACKs that would keep a sender in loss recovery. If both 886 sets of IP addresses, port numbers, and sequence numbers are 887 guessable for a connection, then the connection should employ other 888 measures [RFC4953] for protection against spoofed segments. 890 8. IANA Considerations 892 This section is to be interpreted according to 893 [I-D.narten-iana-considerations-rfc2434bis]. 895 This document does not define any new namespaces. It requests that 896 IANA allocate a new 8-bit TCP option number for the CCI option from 897 the registry maintained at 898 http://www.iana.org/assignments/tcp-parameters. 900 9. Acknowledgments 902 This draft combines and obsoletes [I-D.swami-tcp-lmdr] and 903 [I-D.eggert-tcpm-tcp-retransmit-now]. The authors would like to 904 thank Mark Allman, Marcus Brunner, Alfred Hoenes, Shashikant 905 Maheshwari, Kacheong Poon, Juergen Quittek, Stefan Schmid and Joe 906 Touch for their comments and suggestions on this draft as well as the 907 two original drafts. 909 Simon Schuetz and Lars Eggert are partly funded by the Trilogy 910 project, a research project supported by the European Commission 911 under its Seventh Framework Program. 913 Wesley Eddy's work on this document was performed at NASA's Glenn 914 Research Center, while in support of the NASA Space Communications 915 Architecture Working Group (SCAWG), and the FAA/Eurocontrol Future 916 Communications Study (FCS). 918 10. References 920 10.1. Normative References 922 [I-D.narten-iana-considerations-rfc2434bis] 923 Narten, T. and H. Alvestrand, "Guidelines for Writing an 924 IANA Considerations Section in RFCs", 925 draft-narten-iana-considerations-rfc2434bis-08 (work in 926 progress), October 2007. 928 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 929 RFC 793, September 1981. 931 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 932 November 1990. 934 [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions 935 for High Performance", RFC 1323, May 1992. 937 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 938 for IP version 6", RFC 1981, August 1996. 940 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 941 Requirement Levels", BCP 14, RFC 2119, March 1997. 943 [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion 944 Control", RFC 2581, April 1999. 946 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 947 Timer", RFC 2988, November 2000. 949 [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing 950 TCP's Loss Recovery Using Limited Transmit", RFC 3042, 951 January 2001. 953 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 954 Discovery", RFC 4821, March 2007. 956 10.2. Informative References 958 [DUKE] Duke, M., Henderson, T., and J. Meegan, "Experience with 959 ``Link-UP Notification'' Over a Mobile Satellite Link", 960 ACM Computer Communication Review, Vol. 34, No. 3, 961 July 2004. 963 [EDDY] Eddy, W. and Y. Swami, "Adapting End-host Congestion 964 Control for Mobility", NASA Glenn Research Center 965 Technical Report, CR-2005-213838, July 2005. 967 [I-D.dawkins-trigtran-linkup] 968 Dawkins, S., "End-to-end, Implicit 'Link-Up' 969 Notification", draft-dawkins-trigtran-linkup-01 (work in 970 progress), October 2003. 972 [I-D.eggert-tcpm-tcp-retransmit-now] 973 Eggert, L., "TCP Extensions for Immediate 974 Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02 975 (work in progress), June 2005. 977 [I-D.ietf-hip-mm] 978 Henderson, T., "End-Host Mobility and Multihoming with the 979 Host Identity Protocol", draft-ietf-hip-mm-05 (work in 980 progress), March 2007. 982 [I-D.ietf-tcpimpl-restart] 983 Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP 984 Slow-Start Restart After Idle", 985 draft-ietf-tcpimpl-restart-00 (work in progress), 986 March 1998. 988 [I-D.ietf-tcpm-tcp-uto] 989 Eggert, L. and F. Gont, "TCP User Timeout Option", 990 draft-ietf-tcpm-tcp-uto-08 (work in progress), 991 November 2007. 993 [I-D.swami-tcp-lmdr] 994 Swami, Y., "Lightweight Mobility Detection and Response 995 (LMDR) Algorithm for TCP", draft-swami-tcp-lmdr-07 (work 996 in progress), March 2006. 998 [KOODLI] Koodli, R. and C. Perkins, "Fast Handovers and Context 999 Transfers in Mobile Networks", ACM Computer Communication 1000 Review, Vol. 31, No. 5, October 2001. 1002 [OTT] Ott, J. and D. Kutscher, "OTT Internet: IEEE 802.11b for 1003 Automobile Users", Proc. Infocom 2004, March 2004. 1005 [RFC1122] Braden, R., "Requirements for Internet Hosts - 1006 Communication Layers", STD 3, RFC 1122, October 1989. 1008 [RFC2131] Droms, R., "Dynamic Host Configuration Protocol", 1009 RFC 2131, March 1997. 1011 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1012 (IPv6) Specification", RFC 2460, December 1998. 1014 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1015 of Explicit Congestion Notification (ECN) to IP", 1016 RFC 3168, September 2001. 1018 [RFC3344] Perkins, C., "IP Mobility Support for IPv4", RFC 3344, 1019 August 2002. 1021 [RFC3775] Johnson, D., Perkins, C., and J. Arkko, "Mobility Support 1022 in IPv6", RFC 3775, June 2004. 1024 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 1025 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 1026 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1027 RFC 3819, July 2004. 1029 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", 1030 RFC 4306, December 2005. 1032 [RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick- 1033 Start for TCP and IP", RFC 4782, January 2007. 1035 [RFC4907] Aboba, B., "Architectural Implications of Link 1036 Indications", RFC 4907, June 2007. 1038 [RFC4953] Touch, J., "Defending TCP Against Spoofing Attacks", 1039 RFC 4953, July 2007. 1041 [RFC4957] Krishnan, S., Montavont, N., Njedjou, E., Veerepalli, S., 1042 and A. Yegin, "Link-Layer Event Notifications for 1043 Detecting Network Attachments", RFC 4957, August 2007. 1045 [SCHUETZ] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner, 1046 "Protocol Enhancements for Intermittently Connected 1047 Hosts", ACM Computer Communication Review, Vol. 35, No. 3, 1048 July 2005. 1050 [SCOTT] Scott, J. and G. Mapp, "Link layer-based TCP optimization 1051 for disconnecting networks", ACM Computer Communication 1052 Review, Vol. 33, No. 5, October 2003. 1054 Editorial Comments 1056 [footnote-1] The authors have heard the idea of triggering 1057 retransmits based on connectivity events of directly- 1058 connected links being attributed to Phil Karn ("kick" 1059 operation in the KAQ9 TCP stack). A thread from the 1060 PILC mailing list in 2000 discusses some thoughts on 1061 this (http://www.isi.edu/pilc/list/archive/0691.html). 1063 [footnote-2] Although this specification introduces eight new per- 1064 connection state variables, a preliminary 1065 implementation of an earlier revision of this mechanism 1066 [I-D.swami-tcp-lmdr] only required around a hundred 1067 lines of kernel code. 1069 Appendix A. Background: Classification of Connectivity Disruptions 1071 Connectivity disruptions can occur in many different situations. 1073 They can be due to wireless interference, movement out of a wireless 1074 coverage area, switching between access networks, or simply due to 1075 unplugging an Ethernet cable. Depending on the situation in which 1076 they occur, the implications of connectivity disruptions are 1077 different and must be handled appropriately. This section attempts 1078 to classify different types of connectivity disruptions and discusses 1079 their implications and impact on TCP. 1081 Two main properties of connectivity disruptions affect how TCP reacts 1082 to them: their duration and whether the path characteristics have 1083 significantly changed after they end. This document distinguishes 1084 between "short" and "long" disruptions and "changed" and "unchanged" 1085 path characteristics. Note that these two categories are orthogonal 1086 to each other, i.e., four types of connectivity disruptions exist. 1088 Connectivity disruptions are "short" for a given TCP connection, if 1089 connectivity returns before the RTO fires for the first time, i.e., 1090 when TCP is still in steady-state. In this case, standard TCP 1091 recovers lost data segments through Fast Retransmit and lost ACKs 1092 through successfully delivered later ACKs. Appendix A.1 briefly 1093 describes this case. 1095 Connectivity disruptions are "long" for a given TCP connection, if 1096 the RTO fires at least once before connectivity returns, i.e., when 1097 TCP is in exponential back-off. In this case, TCP can be inefficient 1098 in its retransmission scheme, as described in Appendix A.2. 1100 Whether or not path characteristics change when connectivity returns 1101 is a second important factor for TCP's retransmission scheme. 1102 Standard TCP implicitly assumes that path characteristics remain 1103 unchanged across short disruptions by performing Fast Retransmit 1104 using the path parameters collected before the disruption. For long 1105 disruptions, standard TCP is more conservative and performs slow- 1106 start, re-probing the path characteristics from scratch. However, 1107 the standard behavior can be inefficient due to when it is initiated. 1109 These implicit assumptions can cause standard TCP to misbehave or 1110 perform inefficiently in some scenarios. Figure 4 illustrates the 1111 standard TCP behavior. 1113 +-----------------------+-----------------------+ 1114 Short | Fast Retransmit using | Fast Retransmit using | 1115 Duration | currently collected | currently collected | 1116 < RTO | path characteristics | path characteristics | 1117 +-----------------------+-----------------------+ 1118 Long | | | 1119 Duration | Slow-start | Slow-start | 1120 >= RTO | | | 1121 +-----------------------+-----------------------+ 1122 Unchanged Path Changed Path 1123 Characteristics Characteristics 1125 Figure 4: Standard TCP behavior. 1127 A.1. Short Connectivity Disruptions 1129 One common cause of short connectivity disruptions that result in a 1130 change of the end-to-end path characteristics is transparent network 1131 layer mobility, via protocols such as Mobile IP, NEMO, or HIP. These 1132 protocols generally hide mobility events from the transport layer, 1133 but cannot mask the resulting changes to the end-to-end path that 1134 established TCP connections transmit over. 1136 Consider a Mobile IP scenario as shown in Figure 5. At time T, a 1137 mobile node MN attaches to access network Net-1, connected to the 1138 Internet through access router AR-1 and has the care-of address 1139 . It establishes a TCP connection to the correspondent 1140 node CN. While MN attaches to AR-1, packets between CN and follow PATH-1 (via Cloud-1 and AR-1). Assume that at some time 1142 T+1, MN moves and then attaches to Net-2, which is reachable through 1143 AR-2 with the care-of address . While MN attaches to 1144 AR-2, all packets between CN and follow PATH-2 (through 1145 Cloud-2 and AR-2). 1147 <---------PATH-1----------> 1149 /---------\ +------+ 1150 | | | | Net-1 1151 +---+ Cloud-1 +---+ AR-1 +-----> MN (time=T) 1152 | | | | | 1153 | \----+----/ +---+--+ | 1154 | | | 1155 CN <------+ | PATH-3 | 1156 | | | 1157 | /----V----\ +-------+ V 1158 | | | | | 1159 +---+ Cloud-2 +---+ AR-2 +-----> MN (time=T+1) 1160 | | | | Net-2 1161 \---------/ +-------+ 1163 <--------PATH-2-----------> 1165 Figure 5: Mobility example. 1167 During a transient disconnected period, MN may have disconnected from 1168 Net-1 and not yet attached to Net-2. Consequently, AR-1 may not be 1169 able to deliver packets to MN. This could result in a burst of 1170 packet losses. Several approaches for "fast" or "seamless" handovers 1171 exist that involve adding machinery to the ARs to buffer and redirect 1172 packets originally sent to Net-1 towards Net-2, rather than dropping 1173 them (e.g., [KOODLI]). 1175 As long as MN remains in Net-1, standard congestion control 1176 algorithms [RFC2581] are sufficient. However, once MN moves from 1177 Net-1 to Net-2, two different scenarios are possible depending on 1178 network topology: 1180 o In the first scenario, with standard Mobile IPv4, all packets 1181 destined to are dropped by AR-1 once MN has moved. 1182 Since the latency involved in establishing a new tunnel to the HA 1183 is on the order of the RTT (2*RTT in case of Mobile IPv6), roughly 1184 an entire window's worth of data and ACKs will be dropped by AR-1. 1185 Because of this burst loss, CN and MN are likely to incur 1186 expensive retransmission timeouts. 1188 o In the second scenario, with a fast handover mechanism in place, 1189 losses are masked through buffering and tunneling between routers 1190 AR-1 and AR-2. The exact sequence of buffering and forwarding 1191 between the ARs is not guaranteed to occur in a manner consistent 1192 with the available bandwidth of PATH-3 or conformant to TCP's 1193 clocking expectations. This can cause TCP's behavior over PATH-2 1194 to be based on the unrelated properties of PATH-1 and PATH-3. 1196 After attaching to Net-2, reception of stale ACKs (for data sent on 1197 PATH-1) will cause MN to incorrectly inflate its congestion window. 1198 These stale ACKs do not provide any indication of the congestion 1199 along PATH-2. CN's congestion window becomes similarly inflated by 1200 ACKs that MN sends for data segments redirected over PATH-3. If the 1201 congestion windows from PATH-1 are already too big for PATH-2, this 1202 can overload Net-2 or PATH-2, causing packet loss and timeouts. 1204 On the other hand, if the available bandwidth along PATH-2 is greater 1205 than along PATH-1, and if the sender is in congestion avoidance, it 1206 will need potentially many RTTs before utilizing the available path 1207 capacity. This is due to relatively slow bandwidth increase during 1208 congestion avoidance caused by a stale SS_THRESH. (See [EDDY] for 1209 details.) 1211 A.2. Long Connectivity Disruptions 1213 For long disruptions, standard TCP performs slow-start after 1214 connectivity returns, because the retransmission timeout (RTO) has 1215 expired. This conservative strategy avoids overloading the new path. 1216 However, TCP's general exponential back-off retransmission strategy 1217 can time these slow-starts such that performance decreases. 1219 When a long connectivity disruption occurs along the path between a 1220 host and its peer while the host is transmitting data, it stops 1221 receiving ACKs. After the RTO expires, the host attempts to 1222 retransmit the first unacknowledged segment. TCP implementations 1223 that follow the recommended RTO management proposed in [RFC2988] 1224 double the RTO after each retransmission attempt until it exceeds 60 1225 seconds. This scheme causes a host to attempt to retransmit across 1226 established connections roughly once a minute. (More frequently 1227 during the first minute or two of the connectivity disruption, while 1228 the RTO is still being backed off.) 1230 When the long connectivity disruption ends, standard TCP 1231 implementations still wait until the RTO expires before attempting 1232 retransmission. Figure 6 illustrates this behavior. Depending on 1233 when connectivity becomes available again, this can waste up to a 1234 minute of connectivity for TCPs that implement the recommended RTO 1235 management described in [RFC2988]. For TCP implementations that do 1236 not implement [RFC2988], even longer connectivity periods may be 1237 wasted. For example, Linux uses 120 seconds as the maximum RTO by 1238 default. 1240 Sequence 1241 number X = Successfully transmitted segment 1242 ^ O = Lost segment 1243 | : : : X 1244 | : : :X 1245 | OO O O O O : X 1246 | X: : : 1247 | X : :<------------>: 1248 | X : : Wasted : 1249 | X : : connection : 1250 |X : : time : 1251 +-----:---------------------:--------------:--------> 1252 : : : Time 1253 Connectivity Connectivity TCP 1254 gone back retransmit 1256 Figure 6: Standard TCP behavior in the presence of disrupted 1257 connectivity. 1259 This retransmission behavior is not efficient, especially in 1260 scenarios where connectivity periods are short and connectivity 1261 disruptions are frequent [OTT]. Experiments show that TCP 1262 performance across a path with frequent disruptions is significantly 1263 worse, compared to a similar path without disruptions [SCHUETZ]. 1265 In the ideal case, TCP would attempt a retransmission as soon as 1266 connectivity to its peer was re-established. Figure 7 illustrates 1267 the ideal behavior. 1269 Sequence 1270 number X = Successfully transmitted segment 1271 ^ O = Lost segment 1272 | : : X : 1273 | : :X : 1274 | OO O O O O X : 1275 | X: : : 1276 | X : :<------------>: 1277 | X : : Efficiency : 1278 | X : : improvement : 1279 |X : : : 1280 +-----:---------------------:--------------:--------> 1281 : : : Time 1282 Connectivity Connectivity Next 1283 gone back := immediate scheduled 1284 TCP retransmit retransmit 1286 Figure 7: Ideal TCP behavior in the presence of disrupted 1287 connectivity 1289 The ideal behavior is difficult to achieve for arbitrary connectivity 1290 disruptions. One obviously problematic approach would use higher- 1291 frequency retransmission attempts to enable earlier detection of 1292 whether connectivity has returned. This can generate significant 1293 amounts of extra traffic. Other proposals attempt to trigger faster 1294 retransmissions by retransmitting buffered or newly-crafted segments 1295 from inside the network 1296 [SCOTT][I-D.dawkins-trigtran-linkup][DUKE][RFC3819]. 1298 Note that scenarios exist where path characteristics remain unchanged 1299 after long connectivity disruptions. In this case, even an 1300 intelligently scheduled slow-start is inefficient, because TCP could 1301 safely resume transmitting at the old rate instead of slow-starting. 1302 Although originally developed to avoid line-rate bursts, techniques 1303 for the well-known "slow-start after idle" case 1304 [I-D.ietf-tcpimpl-restart] may be useful to further improve 1305 performance after a disruption ends in such a scenario. This 1306 document does not currently describe this additional optimization, 1307 and an open question remains on how unchanged path characteristics 1308 after long connectivity disruptions could be validated by an end 1309 host. 1311 Appendix B. Document Revision History 1313 +----------+--------------------------------------------------------+ 1314 | Revision | Comments | 1315 +----------+--------------------------------------------------------+ 1316 | 03 | Mainly editorial and textual changes according to | 1317 | | feedback received since last version. | 1318 | 02 | Major modification to the RLCI mechanism for | 1319 | | implementing a 3-way handshake that ensures that both | 1320 | | peers are informed about a connectivity-change | 1321 | | indication. CCI option format, RLCI variables | 1322 | | maintained by the TCP peers and the related state | 1323 | | machines are affected by that modification. | 1324 | 01 | Major revision of the description of the | 1325 | | connectivity-change indication TCP option and its | 1326 | | processing in Section 5. Other formatting changes to | 1327 | | the document include moving some background material | 1328 | | to the appendix. | 1329 | 00 | Initial version. This document is a merge of and | 1330 | | obsoletes [I-D.eggert-tcpm-tcp-retransmit-now] and | 1331 | | [I-D.swami-tcp-lmdr]. | 1332 +----------+--------------------------------------------------------+ 1334 Authors' Addresses 1336 Simon Schuetz 1337 NEC Laboratories Europe 1338 Kurfuerstenanlage 36 1339 Heidelberg 69115 1340 Germany 1342 Phone: +49 6221 4342 165 1343 Email: simon.schuetz@nw.neclab.eu 1344 URI: http://www.nw.neclab.eu 1346 Nikolaos Koutsianas 1347 Nokia Research Center 1349 Email: nkout@mobile.ntua.gr 1351 Lars Eggert 1352 Nokia Research Center 1353 P.O. Box 407 1354 Nokia Group 00045 1355 Finland 1357 Phone: +358 50 48 24461 1358 Email: lars.eggert@nokia.com 1359 URI: http://research.nokia.com/people/lars_eggert/ 1361 Wesley M. Eddy 1362 Verizon Federal Network Systems 1363 NASA Glenn Research Center 1364 21000 Brookpark Road, MS 54-5 1365 Cleveland, OH 44135 1366 USA 1368 Email: weddy@grc.nasa.gov 1369 Yogesh Prem Swami 1370 Nokia Research Center, Dallas 1371 955 Page Mill Road 1372 Palo Alto, California 94304 1373 USA 1375 Phone: +1 972 374 0669 1376 Email: yogesh.swami@nokia.com 1378 Khiem Le 1379 Nokia Siemens Networks 1380 6000 Connection Drive 1381 Irving, TX 75039 1382 USA 1384 Phone: +1 972 342 3502 1385 Email: khiem.le@nsn.com 1387 Full Copyright Statement 1389 Copyright (C) The IETF Trust (2008). 1391 This document is subject to the rights, licenses and restrictions 1392 contained in BCP 78, and except as set forth therein, the authors 1393 retain all their rights. 1395 This document and the information contained herein are provided on an 1396 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1397 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1398 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1399 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1400 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1401 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1403 Intellectual Property 1405 The IETF takes no position regarding the validity or scope of any 1406 Intellectual Property Rights or other rights that might be claimed to 1407 pertain to the implementation or use of the technology described in 1408 this document or the extent to which any license under such rights 1409 might or might not be available; nor does it represent that it has 1410 made any independent effort to identify any such rights. Information 1411 on the procedures with respect to rights in RFC documents can be 1412 found in BCP 78 and BCP 79. 1414 Copies of IPR disclosures made to the IETF Secretariat and any 1415 assurances of licenses to be made available, or the result of an 1416 attempt made to obtain a general license or permission for the use of 1417 such proprietary rights by implementers or users of this 1418 specification can be obtained from the IETF on-line IPR repository at 1419 http://www.ietf.org/ipr. 1421 The IETF invites any interested party to bring to its attention any 1422 copyrights, patents or patent applications, or other proprietary 1423 rights that may cover technology that may be required to implement 1424 this standard. Please address the information to the IETF at 1425 ietf-ipr@ietf.org. 1427 Acknowledgment 1429 Funding for the RFC Editor function is provided by the IETF 1430 Administrative Support Activity (IASA).