idnits 2.17.1 draft-ietf-tcpm-2140bis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 232 has weird spacing: '...sthresh old_...' == Line 234 has weird spacing: '...nd_cwnd old_...' == Line 287 has weird spacing: '... MSSopt curr_...' == Line 297 has weird spacing: '...sthresh curr...' == Line 299 has weird spacing: '...nd_cwnd curr...' == (6 more instances...) == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 15, 2019) is 1832 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 1644 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 1379 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 2140 (Obsoleted by RFC 9040) -- Obsolete informational reference (is this intentional?): RFC 7231 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) -- Obsolete informational reference (is this intentional?): RFC 7540 (Obsoleted by RFC 9113) Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TCPM WG J. Touch 2 Internet Draft Independent 3 Intended status: Informational M. Welzl 4 Obsoletes: 2140 S. Islam 5 Expires: October 2019 University of Oslo 6 April 15, 2019 8 TCP Control Block Interdependence 9 draft-ietf-tcpm-2140bis-00.txt 11 Status of this Memo 13 This Internet-Draft is submitted in full conformance with the 14 provisions of BCP 78 and BCP 79. 16 This document may contain material from IETF Documents or IETF 17 Contributions published or made publicly available before November 18 10, 2008. The person(s) controlling the copyright in some of this 19 material may not have granted the IETF Trust the right to allow 20 modifications of such material outside the IETF Standards Process. 21 Without obtaining an adequate license from the person(s) controlling 22 the copyright in such materials, this document may not be modified 23 outside the IETF Standards Process, and derivative works of it may 24 not be created outside the IETF Standards Process, except to format 25 it for publication as an RFC or to translate it into languages other 26 than English. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six 34 months and may be updated, replaced, or obsoleted by other documents 35 at any time. It is inappropriate to use Internet-Drafts as 36 reference material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html 44 This Internet-Draft will expire on October 15, 2019. 46 Copyright Notice 48 Copyright (c) 2019 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (https://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with 56 respect to this document. Code Components extracted from this 57 document must include Simplified BSD License text as described in 58 Section 4.e of the Trust Legal Provisions and are provided 59 without warranty as described in the Simplified BSD License. 61 Abstract 63 This memo provides guidance to TCP implementers that are intended to 64 help improve convergence to steady-state operation without affecting 65 interoperability. It updates and replaces RFC 2140's description of 66 interdependent TCP control blocks and the ways that part of TCP 67 state can be shared among similar concurrent or consecutive 68 connections. TCP state includes a combination of parameters, such as 69 connection state, current round-trip time estimates, congestion 70 control information, and process information. Most of this state is 71 maintained on a per-connection basis in the TCP Control Block (TCB), 72 but implementations can (and do) share certain TCB information 73 across connections to the same host. Such sharing is intended to 74 improve overall transient transport performance, while maintaining 75 backward-compatibility with existing implementations. The sharing 76 described herein is limited to only the TCB initialization and so 77 has no effect on the long-term behavior of TCP after a connection 78 has been established. 80 Table of Contents 82 1. Introduction...................................................3 83 2. Conventions used in this document..............................3 84 3. Terminology....................................................4 85 4. The TCP Control Block (TCB)....................................4 86 5. TCB Interdependence............................................5 87 6. An Example of Temporal Sharing.................................5 88 7. An Example of Ensemble Sharing.................................9 89 8. Compatibility Issues..........................................11 90 9. Implications..................................................13 91 10. Implementation Observations..................................14 92 11. Updates to RFC 2140..........................................15 93 12. Security Considerations......................................16 94 13. IANA Considerations..........................................16 95 14. References...................................................16 96 14.1. Normative References....................................16 97 14.2. Informative References..................................17 98 15. Acknowledgments..............................................19 99 16. Change log...................................................19 100 17. Appendix A: TCB sharing history..............................21 101 18. Appendix B: Options..........................................22 103 1. Introduction 105 TCP is a connection-oriented reliable transport protocol layered 106 over IP [RFC793]. Each TCP connection maintains state, usually in a 107 data structure called the TCP Control Block (TCB). The TCB contains 108 information about the connection state, its associated local 109 process, and feedback parameters about the connection's transmission 110 properties. As originally specified and usually implemented, most 111 TCB information is maintained on a per-connection basis. Some 112 implementations can (and now do) share certain TCB information 113 across connections to the same host [RFC2140]. Such sharing is 114 intended to lead to better overall transient performance, especially 115 for numerous short-lived and simultaneous connections, as often used 116 in the World-Wide Web [Be94],[Br02]. This sharing of state is 117 intended to help TCP connections converge to steady-state behavior 118 more quickly without affecting TCP interoperability. 120 This document updates RFC 2140's discussion of TCB state sharing and 121 provides a complete replacement for that document. This state 122 sharing affects only TCB initialization [RFC2140] and thus has no 123 effect on the long-term behavior of TCP after a connection has been 124 established nor on interoperability. Path information shared across 125 SYN destination port numbers assumes that TCP segments having the 126 same host-pair experience the same path properties, irrespective of 127 TCP port numbers. The observations about TCB sharing in this 128 document apply similarly to any protocol with congestion state, 129 including SCTP [RFC4960] and DCCP [RFC4340], as well as for 130 individual subflows in Multipath TCP [RFC6824]. 132 2. Conventions used in this document 134 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 135 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 136 "OPTIONAL" in this document are to be interpreted as described in 137 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 138 capitals, as shown here. 140 However, this document is intended to describe behavior that is 141 already permitted by TCP implementers. As a result, it provides 142 informative guidance but does not use such normative language, 143 except when quoting other documents. 145 3. Terminology 147 Host - a source or sink of TCP segments associated with a single IP 148 address 150 Host-pair - a pair of hosts and their corresponding IP addresses 152 Path - an Internet path between the IP addresses of two hosts 154 4. The TCP Control Block (TCB) 156 A TCB describes the data associated with each connection, i.e., with 157 each association of a pair of applications across the network. The 158 TCB contains at least the following information [RFC793]: 160 Local process state 161 pointers to send and receive buffers 162 pointers to retransmission queue and current segment 163 pointers to Internet Protocol (IP) PCB 164 Per-connection shared state 165 macro-state 166 connection state 167 timers 168 flags 169 local and remote host numbers and ports 170 TCP option state 171 micro-state 172 send and receive window state (size*, current number) 173 round-trip time and variance 174 cong. window size (snd_cwnd)* 175 cong. window size threshold (ssthresh)* 176 max window size seen* 177 sendMSS# 178 MMS_S# 179 MMS_R# 180 PMTU# 181 round-trip time and variance# 183 The per-connection information is shown as split into macro-state 184 and micro-state, terminology borrowed from [Co91]. Macro-state 185 describes the protocol for establishing the initial shared state 186 about the connection; we include the endpoint numbers and components 187 (timers, flags) required upon commencement that are later used to 188 help maintain that state. Micro-state describes the protocol after a 189 connection has been established, to maintain the reliability and 190 congestion control of the data transferred in the connection. 192 We further distinguish two other classes of shared micro-state that 193 are associated more with host-pairs than with application pairs. One 194 class is clearly host-pair dependent (#, e.g., MSS, MMS, PMTU, RTT), 195 and the other is host-pair dependent in its aggregate (*, e.g., 196 congestion window information, current window sizes, etc.). 198 5. TCB Interdependence 200 There are two cases of TCB interdependence. Temporal sharing occurs 201 when the TCB of an earlier (now CLOSED) connection to a host is used 202 to initialize some parameters of a new connection to that same host, 203 i.e., in sequence. Ensemble sharing occurs when a currently active 204 connection to a host is used to initialize another (concurrent) 205 connection to that host. 207 6. An Example of Temporal Sharing 209 The TCB data cache is accessed in two ways: it is read to initialize 210 new TCBs and written when more current per-host state is available. 211 New TCBs can be initialized using context from past connections as 212 follows: 214 TEMPORAL SHARING - TCB Initialization 216 Cached TCB New TCB 217 -------------------------------------- 218 old_MMS_S old_MMS_S or not cached 220 old_MMS_R old_MMS_R or not cached 222 old_sendMSS old_sendMSS 224 old_PMTU old_PMTU 226 old_RTT old_RTT 228 old_RTTvar old_RTTvar 230 old_option (option specific) 232 old_ssthresh old_ssthresh 234 old_snd_cwnd old_snd_cwnd 236 Sections 8 and 9 discuss compatibility issues and implications of 237 sharing the specific information listed above. Section 10 gives an 238 overview of known implementations. 240 Most cached TCB values are updated when a connection closes. The 241 exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122], 242 PMTU which is updated after Path MTU Discovery 243 [RFC1191][RFC4821][RFC8201], and sendMSS, which is updated if the 244 MSS option is received in the TCP SYN header. 246 Sharing sendMSS information affects only data in the SYN of the next 247 connection, because sendMSS information is typically included in 248 most TCP SYN segments. Caching PMTU can accelerate the efficiency of 249 PMTUD, but can also result in black-holing until corrected if in 250 error. Caching MMS_R and MMS_S may be of little direct value as they 251 are reported by the local IP stack anyway. 253 The way in which other TCP option state can be shared depends on the 254 details of that option. E.g., TFO state includes the TCP Fast Open 255 Cookie [RFC7413] or, in case TFO fails, a negative TCP Fast Open 256 response. RFC 7413 states, "The client MUST cache negative responses 257 from the server in order to avoid potential connection failures. 258 Negative responses include the server not acknowledging the data in 259 the SYN, ICMP error messages, and (most importantly) no response 260 (SYN-ACK) from the server at all, i.e., connection timeout." [RFC 261 7413]. TFOinfo is cached when a connection is established. 263 Other TCP option state might not be as readily cached. E.g., TCP-AO 264 [RFC5925] success or failure between a host pair for a single SYN 265 destination port might be usefully cached. TCP-AO success or failure 266 to other SYN destination ports on that host pair is never useful to 267 cache because TCP-AO security parameters can vary per service. 269 The table below gives an overview of option-specific information 270 that can be shared. 272 TEMPORAL SHARING - Option info 274 Cached New 275 ---------------------------------------- 276 old_TFO_Cookie old_TFO_Cookie 278 old_TFO_Failure old_TFO_Failure 279 TEMPORAL SHARING - Cache Updates 281 Cached TCB Current TCB when? New Cached TCB 282 ------------------------------------------------------ 283 old_MMS_S curr_ MMS_S OPEN curr MMS_S 285 old_MMS_R curr_ MMS_R OPEN curr_MMS_R 287 old_sendMSS curr_sendMSS MSSopt curr_sendMSS 289 old_PMTU curr_PMTU PMTUD curr_PMTU 291 old_RTT curr_RTT CLOSE merge(curr,old) 293 old_RTTvar curr_RTTvar CLOSE merge(curr,old) 295 old_option curr option ESTAB (depends on option) 297 old_ssthresh curr_ssthresh CLOSE merge(curr,old) 299 old_snd_cwnd curr_snd_cwnd CLOSE merge(curr,old) 301 Caching PMTU and sendMSS is trivial; reported values are cached, and 302 the most recent values are used. The cache is updated when the MSS 303 option is received in a SYN or after PMTUD (i.e., when an ICMPv4 304 Fraqmentation Needed [RFC1191] or ICMPv6 Packet Too Big message is 305 received [RFC8201] or the equivalent is inferred, e.g. as from 306 PLPMTUD [RFC4821]), respectively, so the cache always has the most 307 recent values from any connection. For sendMSS, the cache is 308 consulted only at connection establishment and not otherwise 309 updated, which means that MSS options do not affect current 310 connections. The default sendMSS is never saved; only reported MSS 311 values update the cache, so an explicit override is required to 312 reduce the sendMSS. There is no particular benefit to caching MMS_S 313 and MMS R as these are reported by the local IP stack. 315 TCP options are copied or merged depending on the details of each 316 option, where "merge" is some function that combines the values of 317 "curr" and "old". E.g., TFO state is updated when a connection is 318 established and read before establishing a new connection. 320 RTT values are updated by formulae that merge the old and new 321 values. Dynamic RTT estimation requires a sequence of RTT 322 measurements. As a result, the cached RTT (and its variance) is an 323 average of its previous value with the contents of the currently 324 active TCB for that host, when a TCB is closed. RTT values are 325 updated only when a connection is closed. The method for merging old 326 and current values needs to attempt to reduce the transient for new 327 connections. 329 The updates for RTT, RTTvar and ssthresh rely on existing 330 information, i.e., old values. Should no such values exist, the 331 current values are cached instead. 333 TEMPORAL SHARING - Option info Updates 335 Cached Current when? New Cached 336 ---------------------------------------------------------------- 337 old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie 339 old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure 341 7. An Example of Ensemble Sharing 343 Sharing cached TCB data across concurrent connections requires 344 attention to the aggregate nature of some of the shared state. For 345 example, although MSS and RTT values can be shared by copying, it 346 may not be appropriate to simply copy congestion window or ssthresh 347 information; instead, the new values can be a function (f) of the 348 cumulative values and the number of connections (N). 350 ENSEMBLE SHARING - TCB Initialization 352 Cached TCB New TCB 353 -------------------------------- 354 old_MMS_S old_MMS_S 356 old_MMS_R old_MMS_R 358 old_sendMSS old_sendMSS 360 old_PMTU old_PMTU 362 old_RTT old_RTT 364 old_RTTvar old_RTTvar 366 old ssthresh sum f(old ssthresh sum, N) 368 old snd_cwnd sum f(old snd cwnd sum, N) 370 old_option (option-specific) 372 Sections 8 and 9 discuss compatibility issues and implications of 373 sharing the specific information listed above. 375 The table below gives an overview of option-specific information 376 that can be shared. 378 ENSEMBLE SHARING Option info 380 Cached New 381 ---------------------------------------- 382 old_TFO_Cookie old_TFO_Cookie 384 old_TFO_Failure old_TFO_Failure 386 ENSEMBLE SHARING - Cache Updates 388 Cached TCB Current TCB when? New Cached TCB 389 ----------------------------------------------------- 390 old_MMS_S curr_MMS_S OPEN curr_MMS_S 392 old_MMS_R curr_MMS_R OPEN curr_MMS_R 394 old_sendMSS curr_sendMSS MSSopt curr_sendMSS 396 old_PMTU curr_PMTU PMTUD curr_PMTU 397 /PLPMTUD 399 old_RTT curr_RTT update rtt_update(old,curr) 401 old_RTTvar curr_RTTvar update rtt_update(old,curr) 403 old ssthresh curr ssthresh update adjust sum as appopriate 405 old snd_cwnd curr snd_cwnd update adjust sum as appopriate 407 old_option curr option (depends) (option specific) 409 For ensemble sharing, TCB information should be cached as early as 410 possible, sometimes before a connection is closed. Otherwise, 411 opening multiple concurrent connections may not result in TCB data 412 sharing if no connection closes before others open. The amount of 413 work involved in updating the aggregate average should be minimized, 414 but the resulting value should be equivalent to having all values 415 measured within a single connection. The function "rtt_update" in 416 the ensemble sharing table indicates this operation, which occurs 417 whenever the RTT would have been updated in the individual TCP 418 connection. As a result, the cache contains the shared RTT 419 variables, which no longer need to reside in the TCB. 421 Congestion window size and ssthresh aggregation are more complicated 422 in the concurrent case. When there is an ensemble of connections, we 423 need to decide how that ensemble would have shared these variables, 424 in order to derive initial values for new TCBs. 426 ENSEMBLE SHARING - Option info Updates 428 Cached Current when? New Cached 429 ---------------------------------------------------------------- 430 old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie 432 old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure 434 Any assumption of this sharing can be incorrect because identical 435 endpoint address pairs may not share network paths. In current 436 implementations, new congestion windows are set at an initial value 437 of 4-10 segments [RFC3390][RFC6928], so that the sum of the current 438 windows is increased for any new connection. This can have 439 detrimental consequences where several connections share a highly 440 congested link. 442 There are several ways to initialize the congestion window in a new 443 TCB among an ensemble of current connections to a host. Current TCP 444 implementations initialize it to four segments as standard [rfc3390] 445 and 10 segments experimentally [RFC6928]. These approaches assume 446 that new connections should behave as conservatively as possible. 447 The algorithm described in [Ba12] adjusts the initial cwnd depending 448 on the cwnd values of ongoing connections. There have also been 449 suggestions to use the kind of sharing mechanisms described in this 450 document over long timescales to adapt TCP's initial window 451 automatically [To13]. 453 8. Compatibility Issues 455 For the congestion and current window information, the initial 456 values computed by TCB interdependence may not be consistent with 457 the long-term aggregate behavior of a set of concurrent connections 458 between the same endpoints. Under conventional TCP congestion 459 control, if a single existing connection has converged to a 460 congestion window of 40 segments, two newly joining concurrent 461 connections assume initial windows of 10 segments [RFC6928], and the 462 current connection's window doesn't decrease to accommodate this 463 additional load and connections can mutually interfere. One example 464 of this is seen on low-bandwidth, high-delay links, where concurrent 465 connections supporting Web traffic can collide because their initial 466 windows were too large, even when set at one segment. 468 The authors of [Hu12] recommend caching ssthresh for temporal 469 sharing only when flows are long. Some studies suggest that sharing 470 ssthresh between short flows can deteriorate the performance of 471 individual connections [Hu12, Du16], although this may benefit 472 aggregate network performance. 474 Due to mechanisms like ECMP and LAG [RFC7424], TCP connections 475 sharing the same host-pair may not always share the same path. This 476 does not matter for host-specific information such as RWIN and TCP 477 option state, such as TFOinfo. When TCB information is shared across 478 different SYN destination ports, path-related information can be 479 incorrect; however, the impact of this error is potentially 480 diminished if (as discussed here) TCB sharing affects only the 481 transient event of a connection start or if TCB information is 482 shared only within connections to the same SYN destination port. In 483 case of Temporal Sharing, TCB information could also become invalid 484 over time. Because this is similar to the case when a connection 485 becomes idle, mechanisms that address idle TCP connections (e.g., 486 [RFC7661]) could also be applied to TCB cache management, especially 487 when TCP Fast Open is used [RFC7413]. 489 There may be additional considerations to the way in which TCB 490 interdependence rebalances congestion feedback among the current 491 connections, e.g., it may be appropriate to consider the impact of a 492 connection being in Fast Recovery [RFC5861] or some other similar 493 unusual feedback state, e.g., as inhibiting or affecting the 494 calculations described herein. 496 TCP is sometimes used in situations where packets of the same host- 497 pair do not always take the same path. Multipath routing that relies 498 on examining transport headers, such as ECMP and LAG, may not result 499 in repeatable path selection when TCP segments are encapsulated, 500 encrypted, or altered - for example, in some Virtual Private Network 501 (VPN) tunnels that rely on proprietary encapsulation. Similarly, 502 such approaches cannot operate deterministically when the TCP header 503 is encrypted, e.g., when using IPsec ESP. TCB interdependence among 504 the entire set sharing the same endpoint IP addresses should work 505 without problems under these circumstances. Moreover, measures to 506 increase the probability that connections use the same path could be 507 applied: e.g., the connections could be given the same IPv6 flow 508 label. TCB interdependence can also be extended to sets of host IP 509 address pairs that share the same network path conditions, such as 510 when a group of addresses is on the same LAN (see Section 9). 512 It can be wrong to share TCB information between TCP connections on 513 the same host as identified by the IP address if an IP address is 514 assigned to a new host (e.g., IP address spinning, as is used by 515 ISPs to inhibit running servers). It can be wrong if Network Address 516 (and Port) Translation (NA(P)T) [RFC2663] or any other IP sharing 517 mechanism is used. Such mechanisms are less likely to be used with 518 IPv6. Other methods to identify a host could also be considered to 519 make correct TCB sharing more likely. Moreover, some TCB information 520 is about dominant path properties rather than the specific host. IP 521 addresses may differ, yet the relevant part of the path may be the 522 same. 524 9. Implications 526 There are several implications to incorporating TCB interdependence 527 in TCP implementations. First, it may reduce the need for 528 application-layer multiplexing for performance enhancement 529 [RFC7231]. Protocols like HTTP/2 [RFC7540] avoid connection 530 reestablishment costs by serializing or multiplexing a set of per- 531 host connections across a single TCP connection. This avoids TCP's 532 per-connection OPEN handshake and also avoids recomputing the MSS, 533 RTT, and congestion window values. By avoiding the so-called, "slow- 534 start restart," performance can be optimized [Hu01]. TCB 535 interdependence can provide the "slow-start restart avoidance" of 536 multiplexing, without requiring a multiplexing mechanism at the 537 application layer. 539 TCB interdependence pushes some of the TCP implementation from the 540 traditional transport layer (in the ISO model), to the network 541 layer. This acknowledges that some state is in fact per-host-pair or 542 can be per-path as indicated solely by that host-pair. Transport 543 protocols typically manage per-application-pair associations (per 544 stream), and network protocols manage per-host-pair and path 545 associations (routing). Round-trip time, MSS, and congestion 546 information could be more appropriately handled in a network-layer 547 fashion, aggregated among concurrent connections, and shared across 548 connection instances [RFC3124]. 550 An earlier version of RTT sharing suggested implementing RTT state 551 at the IP layer, rather than at the TCP layer. Our observations 552 describe sharing state among TCP connections, which avoids some of 553 the difficulties in an IP-layer solution. One such problem of an IP 554 layer solution is determining the correspondence between packet 555 exchanges using IP header information alone, where such 556 correspondence is needed to compute RTT. Because TCB sharing 557 computes RTTs inside the TCP layer using TCP header information, it 558 can be implemented more directly and simply than at the IP layer. 559 This is a case where information should be computed at the transport 560 layer, but could be shared at the network layer. 562 Per-host-pair associations are not the limit of these techniques. It 563 is possible that TCBs could be similarly shared between hosts on a 564 subnet or within a cluster, because the predominant path can be 565 subnet-subnet, rather than host-host. Additionally, TCB 566 interdependence can be applied to any protocol with congestion 567 state, including SCTP [RFC4960] and DCCP [RFC4340], as well as for 568 individual subflows in Multipath TCP [RFC6824]. 570 There may be other information that can be shared between concurrent 571 connections. For example, knowing that another connection has just 572 tried to expand its window size and failed, a connection may not 573 attempt to do the same for some period. The idea is that existing 574 TCP implementations infer the behavior of all competing connections, 575 including those within the same host or subnet. One possible 576 optimization is to make that implicit feedback explicit, via 577 extended information associated with the endpoint IP address and its 578 TCP implementation, rather than per-connection state in the TCB. 580 Like the initial version of this document [RFC2140], this update's 581 approach to TCB interdependence focuses on sharing a set of TCBs by 582 updating the TCB state to reduce the impact of transients when 583 connections begin or end. Other mechanisms have since been proposed 584 to continuously share information between all ongoing communication 585 (including connectionless protocols), updating the congestion state 586 during any congestion-related event (e.g., timeout, loss 587 confirmation, etc.) [RFC3124]. By dealing exclusively with 588 transients, TCB interdependence is more likely to exhibit the same 589 behavior as unmodified, independent TCP connections. 591 10. Implementation Observations 593 The observation that some TCB state is host-pair specific rather 594 than application-pair dependent is not new and is a common 595 engineering decision in layered protocol implementations. Although 596 now deprecated, T/TCP [RFC1644] was the first to propose using 597 caches in order to maintain TCB states (see Appendix A for more 598 information). 600 The table below describes the current implementation status for some 601 TCB information in Linux kernel version 4.6, FreeBSD 10 and Windows 602 (as of October 2016). In the table, "shared" only refers to temporal 603 sharing. 605 TCB data Status 606 ----------------------------------------------------------- 607 old MMS_S Not shared 609 old MMS_R Not shared 611 old_sendMSS Cached and shared in Linux (MSS) 613 old PMTU Cached and shared in FreeBSD and Windows (PMTU) 615 old_RTT Cached and shared in FreeBSD and Linux 617 old_RTTvar Cached and shared in FreeBSD 619 old TFOinfo Cached and shared in Linux and Windows 621 old_snd_cwnd Not shared 623 old_ssthresh Cached and shared in FreeBSD and Linux: 624 FreeBSD: arithmetic 625 mean of ssthresh and previous value if 626 a previous value exists; 627 Linux: depending on state, 628 max(cwnd/2, ssthresh) in most cases 630 11. Updates to RFC 2140 632 This document updates the description of TCB sharing in RFC 2140 and 633 its associated impact on existing and new connection state, 634 providing a complete replacement for that document [RFC2140]. It 635 clarifies the previous description and terminology and extends the 636 mechanism to its impact on new protocols and mechanisms, including 637 multipath TCP, fast open, PLPMTUD, NAT, and the TCP Authentication 638 Option. 640 The detailed impact on TCB state addresses TCB parameters in greater 641 detail, addressing RSS in both the send and receive direction, MSS 642 and send-MSS separately, adds path MTU and ssthresh, and addresses 643 the impact on TCP option state. 645 New sections have been added to address compatibility issues and 646 implementation observations. The relation of this work to T/TCP has 647 been moved to an appendix discussion on history, partly to reflect 648 the deprecation of that protocol. 650 Finally, this document updates and significantly expands the 651 referenced literature. 653 12. Security Considerations 655 These presented implementation methods do not have additional 656 ramifications for explicit attacks. They may be susceptible to 657 denial-of-service attacks if not otherwise secured. For example, an 658 application can open a connection and set its window size to zero, 659 denying service to any other subsequent connection between those 660 hosts. 662 TCB sharing may be susceptible to denial-of-service attacks, 663 wherever the TCB is shared, between connections in a single host, or 664 between hosts if TCB sharing is implemented within a subnet (see 665 Implications section). Some shared TCB parameters are used only to 666 create new TCBs, others are shared among the TCBs of ongoing 667 connections. New connections can join the ongoing set, e.g., to 668 optimize send window size among a set of connections to the same 669 host. 671 Attacks on parameters used only for initialization affect only the 672 transient performance of a TCP connection. For short connections, 673 the performance ramification can approach that of a denial-of- 674 service attack. E.g., if an application changes its TCB to have a 675 false and small window size, subsequent connections would experience 676 performance degradation until their window grew appropriately. 678 13. IANA Considerations 680 There are no IANA implications or requests in this document. 682 This section should be removed upon final publication as an RFC. 684 14. References 686 14.1. Normative References 688 This document has no normative references. 690 14.2. Informative References 692 [Br02] Brownlee, N. and K. Claffy, "Understanding Internet 693 Traffic Streams: Dragonflies and Tortoises", IEEE 694 Communications Magazine p110-117, 2002. 696 [Be94] Berners-Lee, T., et al., "The World-Wide Web," 697 Communications of the ACM, V37, Aug. 1994, pp. 76-82. 699 [Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for 700 Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994. 702 [Co91] Comer, D., Stevens, D., Internetworking with TCP/IP, V2, 703 Prentice-Hall, NJ, 1991. 705 [FreeBSD] FreeBSD source code, Release 2.10, http://www.freebsd.org/ 707 [Du16] Dukkipati, N., Yuchung C., and Amin V., "Research 708 Impacting the Practice of Congestion Control." ACM SIGCOMM 709 CCR (editorial), on-line post, July 2016. 711 [Hu01] Hugues, A., Touch, J., Heidemann, J., "Issues in Slow- 712 Start Restart After Idle", draft-hughes-restart-00 713 (expired), Dec. 2001. 715 [Hu12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for 716 short TCP flows," 2012 IEEE International Conference on 717 Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213. 719 [Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A 720 Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala 721 Lumpur, Malaysia, May 23-27 2016. 723 [RFC793] Postel, Jon, "Transmission Control Protocol," Network 724 Working Group RFC-793/STD-7, ISI, Sept. 1981. 726 [RFC1122] Braden, R. (ed), "Requirements for Internet Hosts -- 727 Communication Layers", RFC-1122, Oct. 1989. 729 [RFC1191] Mogul, J., Deering, S., "Path MTU Discovery," RFC 1191, 730 Nov. 1990. 732 [RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions 733 Functional Specification," RFC-1644, July 1994. 735 [RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379, 736 September 1992. 738 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 739 Requirement Levels", BCP 14, RFC 2119, March 1997. 741 [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, 742 April 1997. 744 [RFC2663] Srisuresh, P., Holdrege, M., "IP Network Address 745 Translator (NAT) Terminology and Considerations", RFC- 746 2663, August 1999. 748 [RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's 749 Initial Window," RFC 3390, Oct. 2002. 751 [RFC7231] Fielding, R., J. Reshke, Eds., "HTTP/1.1 Semantics and 752 Content," RFC-7231, June 2014. 754 [RFC3124] Balakrishnan, H., Seshan, S., "The Congestion Manager," 755 RFC 3124, June 2001. 757 [RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion 758 Control Protocol (DCCP)," RFC 4340, Mar. 2006. 760 [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU 761 Discovery," RFC 4821, Mar. 2007. 763 [RFC4960] Stewart, R., (Ed.), "Stream Control Transmission 764 Protocol," RFC4960, Sept. 2007. 766 [RFC5861] Allman, M., Paxson, V., Blanton, E., "TCP Congestion 767 Control," RFC 5861, Sept. 2009. 769 [RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication 770 Option," RFC 5925, June 2010. 772 [RFC6824] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., "TCP 773 Extensions for Multipath Operation with Multiple 774 Addresses," RFC 6824, Jan. 2013. 776 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing 777 TCP's Initial Window," RFC 6928, Apr. 2013. 779 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., Jain, A., "TCP Fast 780 Open", RFC 7413, Dec. 2014. 782 [RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., Khasnabish, 783 B., "Mechanisms for Optimizing Link Aggregation Group 784 (LAG) and Equal-Cost Multipath (ECMP) Component Link 785 Utilization in Networks", RFC 7424, Jan. 2015 787 [RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer 788 Protocol Version 2 (HTTP/2)", RFC 7540, May 2015. 790 [RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP 791 to Support Rate-Limited Traffic", RFC 7661, Oct. 2015. 793 [RFC8174] Leiba., B., "Ambiguity of Uppercase vs Lowercase in RFC 794 2119 Key Words", RFC 8174, May 2017. 796 [RFC8201] McCann, J., Deering. S., Mogul, J., Hinden, R. (Ed.), 797 "Path MTU Discovery for IP version 6," RFC 8201, Jul. 798 2017. 800 [To13] Touch, J., "Automating the Initial Window in TCP," draft- 801 touch-tcpm-automatic-iw-03 (expired), Jan. 2013. 803 15. Acknowledgments 805 The authors would like to thank for Praveen Balasubramanian for 806 information regarding TCB sharing in Windows, and Yuchung Cheng, 807 Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on 808 earlier versions of the draft. Earlier revisions of this work 809 received funding from a collaborative research project between the 810 University of Oslo and Huawei Technologies Co., Ltd. and were partly 811 supported by USC/ISI's Postel Center. 813 This document was prepared using 2-Word-v2.0.template.dot. 815 16. Change log 817 This section should be removed upon final publication as an RFC. 819 ietf-00: 821 - Re-issued as draft-ietf-tcpm-2140bis due to WG adoption. 822 - Cleaned orphan references to T/TCP, removed incomplete refs 823 - Moved references to informative section and updated Sec 2 824 - Updated to clarify no impact to interoperability 825 - Updated appendix B to avoid 2119 language 827 06: 829 - Changed to update 2140, cite it normatively, and summarize the 830 updates in a separate section 832 05: 834 - Fixed some TBDs. 836 04: 838 - Removed BCP-style recommendations and fixed some TBDs. 840 03: 842 - Updated Touch's affiliation and address information 844 02: 846 - Stated that our OS implementation overview table only covers 847 temporal sharing. 849 - Correctly reflected sharing of old_RTT in Linux in the 850 implementation overview table. 852 - Marked entries that are considered safe to share with an 853 asterisk (suggestion was to split the table) 855 - Discussed correct host identification: NATs may make IP 856 addresses the wrong input, could e.g. use HTTP cookie. 858 - Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and 859 MTU 861 - Added information about option sharing, listed options in the 862 appendix 864 Authors' Addresses 866 Joe Touch 867 Manhattan Beach, CA 90266 868 USA 870 Phone: +1 (310) 560-0334 871 Email: touch@strayalpha.com 872 Michael Welzl 873 University of Oslo 874 PO Box 1080 Blindern 875 Oslo N-0316 876 Norway 878 Phone: +47 22 85 24 20 879 Email: michawe@ifi.uio.no 881 Safiqul Islam 882 University of Oslo 883 PO Box 1080 Blindern 884 Oslo N-0316 885 Norway 887 Phone: +47 22 84 08 37 888 Email: safiquli@ifi.uio.no 890 17. Appendix A: TCB sharing history 892 T/TCP proposed using caches to maintain TCB information across 893 instances (temporal sharing), e.g., smoothed RTT, RTT variance, 894 congestion avoidance threshold, and MSS [RFC1644]. These values were 895 in addition to connection counts used by T/TCP to accelerate data 896 delivery prior to the full three-way handshake during an OPEN. The 897 goal was to aggregate TCB components where they reflect one 898 association - that of the host-pair, rather than artificially 899 separating those components by connection. 901 At least one T/TCP implementation saved the MSS and aggregated the 902 RTT parameters across multiple connections, but omitted caching the 903 congestion window information [Br94], as originally specified in 904 [RFC1379]. Some T/TCP implementations immediately updated MSS when 905 the TCP MSS header option was received [Br94], although this was not 906 addressed specifically in the concepts or functional specification 907 [RFC1379][RFC1644]. In later T/TCP implementations, RTT values were 908 updated only after a CLOSE, which does not benefit concurrent 909 sessions. 911 Temporal sharing of cached TCB data was originally implemented in 912 the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same 913 [FreeBSD]. As mentioned before, only the MSS and RTT parameters were 914 cached, as originally specified in [RFC1379]. Later discussion of 915 T/TCP suggested including congestion control parameters in this 916 cache; for example, [RFC1644] (Section 3.1) hints at initializing 917 the congestion window to the old window size. 919 18. Appendix B: Options 921 In addition to the options that can be cached and shared, this memo 922 also lists known options for which state is unsafe to be kept. This 923 list is meant to avoid work duplication and should be removed upon 924 publication. 926 Obsolete (unsafe to keep state): 928 ECHO 930 ECHO REPLY 932 PO Conn permitted 934 PO service profile 936 CC 938 CC.NEW 940 CC.ECHO 942 Alt CS req 944 Alt CS data 946 No state to keep: 948 EOL 950 NOP 952 WS 954 SACK 956 TS 958 MD5 960 TCP-AO 961 EXP1 963 EXP2 965 Unsafe to keep state: 967 Skeeter (DH exchange - might be obsolete, though) 969 Bubba (DH exchange - might really be obsolete, though) 971 Trailer CS 973 SCPS capabilities 975 S-NACK 977 Records boundaries 979 Corruption experienced 981 SNAP 983 TCP Compression 985 Quickstart response 987 UTO 989 MPTCP (can we cache when this fails?) 991 TFO success 993 Safe but optional to keep state: 995 MSS 997 TFO failure (so we don't try again, since it's optional) 999 Safe and necessary to keep state: 1001 TFP cookie (if TFO succeeded in the past)