idnits 2.17.1 draft-touch-tcpm-2140bis-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 227 has weird spacing: '...sthresh old_...' == Line 229 has weird spacing: '...nd_cwnd old_...' == Line 282 has weird spacing: '... MSSopt curr_...' == Line 292 has weird spacing: '...sthresh curr...' == Line 294 has weird spacing: '...nd_cwnd curr...' == (6 more instances...) == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 4, 2019) is 1938 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2140 (Obsoleted by RFC 9040) -- Obsolete informational reference (is this intentional?): RFC 1644 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 1379 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 7231 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) -- Obsolete informational reference (is this intentional?): RFC 7540 (Obsoleted by RFC 9113) Summary: 2 errors (**), 0 flaws (~~), 8 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TCPM WG J. Touch 2 Internet Draft Independent Consultant 3 Intended status: Informational M. Welzl 4 Obsoletes: 2140 S. Islam 5 Expires: July 2019 University of Oslo 6 January 4, 2019 8 TCP Control Block Interdependence 9 draft-touch-tcpm-2140bis-06.txt 11 Status of this Memo 13 This Internet-Draft is submitted in full conformance with the 14 provisions of BCP 78 and BCP 79. 16 This document may contain material from IETF Documents or IETF 17 Contributions published or made publicly available before November 18 10, 2008. The person(s) controlling the copyright in some of this 19 material may not have granted the IETF Trust the right to allow 20 modifications of such material outside the IETF Standards Process. 21 Without obtaining an adequate license from the person(s) controlling 22 the copyright in such materials, this document may not be modified 23 outside the IETF Standards Process, and derivative works of it may 24 not be created outside the IETF Standards Process, except to format 25 it for publication as an RFC or to translate it into languages other 26 than English. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six 34 months and may be updated, replaced, or obsoleted by other documents 35 at any time. It is inappropriate to use Internet-Drafts as 36 reference material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html 44 This Internet-Draft will expire on July 4, 2019. 46 Copyright Notice 48 Copyright (c) 2019 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with 56 respect to this document. 58 Abstract 60 This memo updates and replaces RFC 2140's description of 61 interdependent TCP control blocks, in which part of the TCP state is 62 shared among similar concurrent or consecutive connections. TCP 63 state includes a combination of parameters, such as connection 64 state, current round-trip time estimates, congestion control 65 information, and process information. Most of this state is 66 maintained on a per-connection basis in the TCP Control Block (TCB), 67 but implementations can (and do) share certain TCB information 68 across connections to the same host. Such sharing is intended to 69 improve overall transient transport performance, while maintaining 70 backward-compatibility with existing implementations. The sharing 71 described herein is limited to only the TCB initialization and so 72 has no effect on the long-term behavior of TCP after a connection 73 has been established. 75 Table of Contents 77 1. Introduction...................................................3 78 2. Conventions used in this document..............................3 79 3. Terminology....................................................4 80 4. The TCP Control Block (TCB)....................................4 81 5. TCB Interdependence............................................5 82 6. An Example of Temporal Sharing.................................5 83 7. An Example of Ensemble Sharing.................................8 84 8. Compatibility Issues..........................................10 85 9. Implications..................................................12 86 10. Implementation Observations..................................13 87 11. Security Considerations......................................15 88 12. IANA Considerations..........................................15 89 13. References...................................................15 90 13.1. Normative References....................................15 91 13.2. Informative References..................................16 93 14. Acknowledgments..............................................18 94 15. Change log...................................................18 95 16. Appendix A: TCB sharing history..............................20 96 17. Appendix B: Options..........................................20 98 1. Introduction 100 TCP is a connection-oriented reliable transport protocol layered 101 over IP [RFC793]. Each TCP connection maintains state, usually in a 102 data structure called the TCP Control Block (TCB). The TCB contains 103 information about the connection state, its associated local 104 process, and feedback parameters about the connection's transmission 105 properties. As originally specified and usually implemented, most 106 TCB information is maintained on a per-connection basis. Some 107 implementations can (and now do) share certain TCB information 108 across connections to the same host [RFC2140]. Such sharing is 109 intended to lead to better overall transient performance, especially 110 for numerous short-lived and simultaneous connections, as often used 111 in the World-Wide Web [Be94],[Br02]. 113 This document updates RFC 2140's discussion of TCB state sharing and 114 provides a complete replacement for that document. This state 115 sharing affects only TCB initialization [RFC2140] and thus has no 116 effect on the long-term behavior of TCP after a connection has been 117 established. Path information shared across SYN destination port 118 numbers assumes that TCP segments having the same host-pair 119 experience the same path properties, irrespective of TCP port 120 numbers. The observations about TCB sharing in this document apply 121 similarly to any protocol with congestion state, including SCTP 122 [RFC4960] and DCCP [RFC4340], as well as for individual subflows in 123 Multipath TCP [RFC6824]. 125 2. Conventions used in this document 127 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 128 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 129 document are to be interpreted as described in RFC 2119 [RFC2119]. 131 In this document, these words will appear with that interpretation 132 only when in ALL CAPS. Lower case uses of these words are not to be 133 interpreted as carrying significance described in RFC 2119. 135 In this document, the characters ">>" preceding an indented line(s) 136 indicates a statement using the key words listed above. This 137 convention aids reviewers in quickly identifying or finding the 138 portions of this RFC covered by these keywords. 140 3. Terminology 142 Host - a source or sink of TCP segments associated with a single IP 143 address 145 Host-pair - a pair of hosts and their corresponding IP addresses 147 Path - an Internet path between the IP addresses of two hosts 149 4. The TCP Control Block (TCB) 151 A TCB describes the data associated with each connection, i.e., with 152 each association of a pair of applications across the network. The 153 TCB contains at least the following information [RFC793]: 155 Local process state 156 pointers to send and receive buffers 157 pointers to retransmission queue and current segment 158 pointers to Internet Protocol (IP) PCB 159 Per-connection shared state 160 macro-state 161 connection state 162 timers 163 flags 164 local and remote host numbers and ports 165 TCP option state 166 micro-state 167 send and receive window state (size*, current number) 168 round-trip time and variance 169 cong. window size (snd_cwnd)* 170 cong. window size threshold (ssthresh)* 171 max window size seen* 172 sendMSS# 173 MMS_S# 174 MMS_R# 175 PMTU# 176 round-trip time and variance# 178 The per-connection information is shown as split into macro-state 179 and micro-state, terminology borrowed from [Co91]. Macro-state 180 describes the protocol for establishing the initial shared state 181 about the connection; we include the endpoint numbers and components 182 (timers, flags) required upon commencement that are later used to 183 help maintain that state. Micro-state describes the protocol after a 184 connection has been established, to maintain the reliability and 185 congestion control of the data transferred in the connection. 187 We further distinguish two other classes of shared micro-state that 188 are associated more with host-pairs than with application pairs. One 189 class is clearly host-pair dependent (#, e.g., MSS, MMS, PMTU, RTT), 190 and the other is host-pair dependent in its aggregate (*, e.g., 191 congestion window information, current window sizes, etc.). 193 5. TCB Interdependence 195 There are two cases of TCB interdependence. Temporal sharing occurs 196 when the TCB of an earlier (now CLOSED) connection to a host is used 197 to initialize some parameters of a new connection to that same host, 198 i.e., in sequence. Ensemble sharing occurs when a currently active 199 connection to a host is used to initialize another (concurrent) 200 connection to that host. 202 6. An Example of Temporal Sharing 204 The TCB data cache is accessed in two ways: it is read to initialize 205 new TCBs and written when more current per-host state is available. 206 New TCBs can be initialized using context from past connections as 207 follows: 209 TEMPORAL SHARING - TCB Initialization 211 Cached TCB New TCB 212 -------------------------------------- 213 old_MMS_S old_MMS_S or not cached 215 old_MMS_R old_MMS_R or not cached 217 old_sendMSS old_sendMSS 219 old_PMTU old_PMTU 221 old_RTT old_RTT 223 old_RTTvar old_RTTvar 225 old_option (option specific) 227 old_ssthresh old_ssthresh 229 old_snd_cwnd old_snd_cwnd 231 Sections 8 and 9 discuss compatibility issues and implications of 232 sharing the specific information listed above. Section 10 gives an 233 overview of known implementations. 235 Most cached TCB values are updated when a connection closes. The 236 exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122], 237 PMTU which is updated after Path MTU Discovery 238 [RFC1191][RFC4821][RFC8201], and sendMSS, which is updated if the 239 MSS option is received in the TCP SYN header. 241 Sharing sendMSS information affects only data in the SYN of the next 242 connection, because sendMSS information is typically included in 243 most TCP SYN segments. Caching PMTU can accelerate the efficiency of 244 PMTUD, but can also result in black-holing until corrected if in 245 error. Caching MMS_R and MMS_S may be of little direct value as they 246 are reported by the local IP stack anyway. 248 The way in which other TCP option state can be shared depends on the 249 details of that option. E.g., TFO state includes the TCP Fast Open 250 Cookie [RFC7413] or, in case TFO fails, a negative TCP Fast Open 251 response. RFC 7413 states, "The client MUST cache negative responses 252 from the server in order to avoid potential connection failures. 253 Negative responses include the server not acknowledging the data in 254 the SYN, ICMP error messages, and (most importantly) no response 255 (SYN-ACK) from the server at all, i.e., connection timeout." [RFC 256 7413]. TFOinfo is cached when a connection is established. 258 Other TCP option state might not be as readily cached. E.g., TCP-AO 259 [RFC5925] success or failure between a host pair for a single SYN 260 destination port might be usefully cached. TCP-AO success or failure 261 to other SYN destination ports on that host pair is never useful to 262 cache because TCP-AO security parameters can vary per service. 264 The table below gives an overview of option-specific information 265 that can be shared. 267 TEMPORAL SHARING - Option info 269 Cached New 270 ---------------------------------------- 271 old_TFO_Cookie old_TFO_Cookie 273 old_TFO_Failure old_TFO_Failure 274 TEMPORAL SHARING - Cache Updates 276 Cached TCB Current TCB when? New Cached TCB 277 ------------------------------------------------------ 278 old_MMS_S curr_ MMS_S OPEN curr MMS_S 280 old_MMS_R curr_ MMS_R OPEN curr_MMS_R 282 old_sendMSS curr_sendMSS MSSopt curr_sendMSS 284 old_PMTU curr_PMTU PMTUD curr_PMTU 286 old_RTT curr_RTT CLOSE merge(curr,old) 288 old_RTTvar curr_RTTvar CLOSE merge(curr,old) 290 old_option curr option ESTAB (depends on option) 292 old_ssthresh curr_ssthresh CLOSE merge(curr,old) 294 old_snd_cwnd curr_snd_cwnd CLOSE merge(curr,old) 296 Caching PMTU and sendMSS is trivial; reported values are cached, and 297 the most recent values are used. The cache is updated when the MSS 298 option is received in a SYN or after PMTUD (i.e., when an ICMPv4 299 Fraqmentation Needed [RFC1191] or ICMPv6 Packet Too Big message is 300 received [RFC8201] or the equivalent is inferred, e.g. as from 301 PLPMTUD [RFC4821]), respectively, so the cache always has the most 302 recent values from any connection. For sendMSS, the cache is 303 consulted only at connection establishment and not otherwise 304 updated, which means that MSS options do not affect current 305 connections. The default sendMSS is never saved; only reported MSS 306 values update the cache, so an explicit override is required to 307 reduce the sendMSS. There is no particular benefit to caching MMS_S 308 and MMS R as these are reported by the local IP stack. 310 TCP options are copied or merged depending on the details of each 311 option, where "merge" is some function that combines the values of 312 "curr" and "old". E.g., TFO state is updated when a connection is 313 established and read before establishing a new connection. 315 RTT values are updated by a more complicated mechanism 316 [RFC1644][Ja86]. Dynamic RTT estimation requires a sequence of RTT 317 measurements. As a result, the cached RTT (and its variance) is an 318 average of its previous value with the contents of the currently 319 active TCB for that host, when a TCB is closed. RTT values are 320 updated only when a connection is closed. The method for merging old 321 and current values needs to attempt to reduce the transient for new 322 connections. 324 The updates for RTT, RTTvar and ssthresh rely on existing 325 information, i.e., old values. Should no such values exist, the 326 current values are cached instead. 328 TEMPORAL SHARING - Option info Updates 330 Cached Current when? New Cached 331 ---------------------------------------------------------------- 332 old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie 334 old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure 336 7. An Example of Ensemble Sharing 338 Sharing cached TCB data across concurrent connections requires 339 attention to the aggregate nature of some of the shared state. For 340 example, although MSS and RTT values can be shared by copying, it 341 may not be appropriate to simply copy congestion window or ssthresh 342 information; instead, the new values can be a function (f) of the 343 cumulative values and the number of connections (N). 345 ENSEMBLE SHARING - TCB Initialization 347 Cached TCB New TCB 348 -------------------------------- 349 old_MMS_S old_MMS_S 351 old_MMS_R old_MMS_R 353 old_sendMSS old_sendMSS 355 old_PMTU old_PMTU 357 old_RTT old_RTT 359 old_RTTvar old_RTTvar 361 old ssthresh sum f(old ssthresh sum, N) 363 old snd_cwnd sum f(old snd cwnd sum, N) 365 old_option (option-specific) 367 Sections 8 and 9 discuss compatibility issues and implications of 368 sharing the specific information listed above. 370 The table below gives an overview of option-specific information 371 that can be shared. 373 ENSEMBLE SHARING Option info 375 Cached New 376 ---------------------------------------- 377 old_TFO_Cookie old_TFO_Cookie 379 old_TFO_Failure old_TFO_Failure 381 ENSEMBLE SHARING - Cache Updates 383 Cached TCB Current TCB when? New Cached TCB 384 ----------------------------------------------------- 385 old_MMS_S curr_MMS_S OPEN curr_MMS_S 387 old_MMS_R curr_MMS_R OPEN curr_MMS_R 389 old_sendMSS curr_sendMSS MSSopt curr_sendMSS 391 old_PMTU curr_PMTU PMTUD curr_PMTU 392 /PLPMTUD 394 old_RTT curr_RTT update rtt_update(old,curr) 396 old_RTTvar curr_RTTvar update rtt_update(old,curr) 398 old ssthresh curr ssthresh update adjust sum as appopriate 400 old snd_cwnd curr snd_cwnd update adjust sum as appopriate 402 old_option curr option (depends) (option specific) 404 For ensemble sharing, TCB information should be cached as early as 405 possible, sometimes before a connection is closed. Otherwise, 406 opening multiple concurrent connections may not result in TCB data 407 sharing if no connection closes before others open. The amount of 408 work involved in updating the aggregate average should be minimized, 409 but the resulting value should be equivalent to having all values 410 measured within a single connection. The function "rtt_update" in 411 the ensemble sharing table indicates this operation, which occurs 412 whenever the RTT would have been updated in the individual TCP 413 connection. As a result, the cache contains the shared RTT 414 variables, which no longer need to reside in the TCB [Ja86]. 416 Congestion window size and ssthresh aggregation are more complicated 417 in the concurrent case. When there is an ensemble of connections, we 418 need to decide how that ensemble would have shared these variables, 419 in order to derive initial values for new TCBs. 421 ENSEMBLE SHARING - Option info Updates 423 Cached Current when? New Cached 424 ---------------------------------------------------------------- 425 old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie 427 old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure 429 Any assumption of this sharing can be incorrect because identical 430 endpoint address pairs may not share network paths. In current 431 implementations, new congestion windows are set at an initial value 432 of 4-10 segments [RFC3390][RFC6928], so that the sum of the current 433 windows is increased for any new connection. This can have 434 detrimental consequences where several connections share a highly 435 congested link. 437 There are several ways to initialize the congestion window in a new 438 TCB among an ensemble of current connections to a host. Current TCP 439 implementations initialize it to four segments as standard [rfc3390] 440 and 10 segments experimentally [RFC6928] and T/TCP hinted that it 441 should be initialized to the old window size [RFC1644]. In the 442 former cases, the assumption is that new connections should behave 443 as conservatively as possible. In the latter T/TCP case, no 444 accommodation is made for concurrent aggregate behavior. The 445 algorithm described in [Ba12] adjusts the initial cwnd depending on 446 the cwnd values of ongoing connections. 448 8. Compatibility Issues 450 For the congestion and current window information, the initial 451 values computed by TCB interdependence may not be consistent with 452 the long-term aggregate behavior of a set of concurrent connections 453 between the same endpoints. Under conventional TCP congestion 454 control, if a single existing connection has converged to a 455 congestion window of 40 segments, two newly joining concurrent 456 connections assume initial windows of 10 segments [RFC6928], and the 457 current connection's window doesn't decrease to accommodate this 458 additional load and connections can mutually interfere. One example 459 of this is seen on low-bandwidth, high-delay links, where concurrent 460 connections supporting Web traffic can collide because their initial 461 windows were too large, even when set at one segment. 463 The authors of [Hu12] recommend caching ssthresh for temporal 464 sharing only when flows are long. Some studies suggest that sharing 465 ssthresh between short flows can deteriorate the performance of 466 individual connections [Hu12, Du16], although this may benefit 467 aggregate network performance. 469 Due to mechanisms like ECMP and LAG [RFC7424], TCP connections 470 sharing the same host-pair may not always share the same path. This 471 does not matter for host-specific information such as RWIN and TCP 472 option state, such as TFOinfo. When TCB information is shared across 473 different SYN destination ports, path-related information can be 474 incorrect; however, the impact of this error is potentially 475 diminished if (as discussed here) TCB sharing affects only the 476 transient event of a connection start or if TCB information is 477 shared only within connections to the same SYN destination port. In 478 case of Temporal Sharing, TCB information could also become invalid 479 over time. Because this is similar to the case when a connection 480 becomes idle, mechanisms that address idle TCP connections (e.g., 481 [RFC7661]) could also be applied to TCB cache management, especially 482 when TCP Fast Open is used [RFC7413]. 484 There may be additional considerations to the way in which TCB 485 interdependence rebalances congestion feedback among the current 486 connections, e.g., it may be appropriate to consider the impact of a 487 connection being in Fast Recovery [RFC5861] or some other similar 488 unusual feedback state, e.g., as inhibiting or affecting the 489 calculations described herein. 491 TCP is sometimes used in situations where packets of the same host- 492 pair always take the same path. Because ECMP and LAG examine TCP 493 port numbers, they may not be supported when TCP segments are 494 encapsulated, encrypted, or altered - for example, some Virtual 495 Private Networks (VPNs) are known to use proprietary UDP 496 encapsulation methods. Similarly, they cannot operate when the TCP 497 header is encrypted, e.g., when using IPsec ESP. TCB interdependence 498 among the entire set sharing the same endpoint IP addresses should 499 work without problems under these circumstances. Moreover, measures 500 to increase the probability that connections use the same path could 501 be applied: e.g., the connections could be given the same IPv6 flow 502 label. TCB interdependence can also be extended to sets of host IP 503 address pairs that share the same network path conditions, such as 504 when a group of addresses is on the same LAN (see Section 9). 506 It can be wrong to share TCB information between TCP connections on 507 the same host as identified by the IP address if an IP address is 508 assigned to a new host (e.g., IP address spinning, as is used by 509 ISPs to inhibit running servers). It can be wrong if Network Address 510 (and Port) Translation (NA(P)T) [RFC2663] or any other IP sharing 511 mechanism is used. Such mechanisms are less likely to be used with 512 IPv6. Other methods to identify a host could also be considered to 513 make correct TCB sharing more likely. Moreover, some TCB information 514 is about dominant path properties rather than the specific host. IP 515 addresses may differ, yet the relevant part of the path may be the 516 same. 518 9. Implications 520 There are several implications to incorporating TCB interdependence 521 in TCP implementations. First, it may reduce the need for 522 application-layer multiplexing for performance enhancement 523 [RFC7231]. Protocols like HTTP/2 [RFC7540] avoid connection 524 reestablishment costs by serializing or multiplexing a set of per- 525 host connections across a single TCP connection. This avoids TCP's 526 per-connection OPEN handshake and also avoids recomputing the MSS, 527 RTT, and congestion window values. By avoiding the so-called, "slow- 528 start restart," performance can be optimized [Hu01]. TCB 529 interdependence can provide the "slow-start restart avoidance" of 530 multiplexing, without requiring a multiplexing mechanism at the 531 application layer. 533 TCB interdependence pushes some of the TCP implementation from the 534 traditional transport layer (in the ISO model), to the network 535 layer. This acknowledges that some state is in fact per-host-pair or 536 can be per-path as indicated solely by that host-pair. Transport 537 protocols typically manage per-application-pair associations (per 538 stream), and network protocols manage per-host-pair and path 539 associations (routing). Round-trip time, MSS, and congestion 540 information could be more appropriately handled in a network-layer 541 fashion, aggregated among concurrent connections, and shared across 542 connection instances [RFC3124]. 544 An earlier version of RTT sharing suggested implementing RTT state 545 at the IP layer, rather than at the TCP layer [Ja86]. Our 546 observations are for sharing state among TCP connections, which 547 avoids some of the difficulties in an IP-layer solution. One such 548 problem is determining the associated prior outgoing packet for an 549 incoming packet, to infer RTT from the exchange. Because RTTs are 550 still determined inside the TCP layer, this is simpler than at the 551 IP layer. This is a case where information should be computed at the 552 transport layer, but could be shared at the network layer. 554 Per-host-pair associations are not the limit of these techniques. It 555 is possible that TCBs could be similarly shared between hosts on a 556 subnet or within a cluster, because the predominant path can be 557 subnet-subnet, rather than host-host. Additionally, TCB 558 interdependence can be applied to any protocol with congestion 559 state, including SCTP [RFC4960] and DCCP [RFC4340], as well as for 560 individual subflows in Multipath TCP [RFC6824]. 562 There may be other information that can be shared between concurrent 563 connections. For example, knowing that another connection has just 564 tried to expand its window size and failed, a connection may not 565 attempt to do the same for some period. The idea is that existing 566 TCP implementations infer the behavior of all competing connections, 567 including those within the same host or subnet. One possible 568 optimization is to make that implicit feedback explicit, via 569 extended information associated with the endpoint IP address and its 570 TCP implementation, rather than per-connection state in the TCB. 572 Like the initial version of this document [RFC2140], this update's 573 approach to TCB interdependence focuses on sharing a set of TCBs by 574 updating the TCB state to reduce the impact of transients when 575 connections begin or end. Other mechanisms have since been proposed 576 to continuously share information between all ongoing communication 577 (including connectionless protocols), updating the congestion state 578 during any congestion-related event (e.g., timeout, loss 579 confirmation, etc.) [RFC3124]. By dealing exclusively with 580 transients, TCB interdependence is more likely to exhibit the same 581 behavior as unmodified, independent TCP connections. 583 10. Implementation Observations 585 The observation that some TCB state is host-pair specific rather 586 than application-pair dependent is not new and is a common 587 engineering decision in layered protocol implementations. A 588 discussion of sharing RTT information among protocols layered over 589 IP, including UDP and TCP, occurred in [Ja86]. Although now 590 deprecated, T/TCP was the first to propose using caches in order to 591 maintain TCB states (see Appendix A for more information). 593 The table below describes the current implementation status for some 594 TCB information in Linux kernel version 4.6, FreeBSD 10 and Windows 595 (as of October 2016). In the table, "shared" only refers to temporal 596 sharing. 598 TCB data Status 599 ----------------------------------------------------------- 600 old MMS_S Not shared 602 old MMS_R Not shared 604 old_sendMSS Cached and shared in Linux (MSS) 606 old PMTU Cached and shared in FreeBSD and Windows (PMTU) 608 old_RTT Cached and shared in FreeBSD and Linux 610 old_RTTvar Cached and shared in FreeBSD 612 old TFOinfo Cached and shared in Linux and Windows 614 old_snd_cwnd Not shared 616 old_ssthresh Cached and shared in FreeBSD and Linux: 617 FreeBSD: arithmetic 618 mean of ssthresh and previous value if 619 a previous value exists; 620 Linux: depending on state, 621 max(cwnd/2, ssthresh) in most cases 623 11. Updates to RFC 2140 625 This document updates the description of TCB sharing in RFC 2140 and 626 its associated impact on existing and new connection state, 627 providing a complete replacement for that document [RFC2140]. It 628 clarifies the previous description and terminology and extends the 629 mechanism to its impact on new protocols and mechanisms, including 630 multipath TCP, fast open, PLPMTUD, NAT, and the TCP Authentication 631 Option. 633 The detailed impact on TCB state addresses TCB parameters in greater 634 detail, addressing RSS in both the send and receive direction, MSS 635 and send-MSS separately, adds path MTU and ssthresh, and addresses 636 the impact on TCP option state. 638 New sections have been added to address compatibility issues and 639 implementation observations. The relation of this work to T/TCP has 640 been moved to an appendix discussion on history, partly to reflect 641 the deprecation of that protocol. 643 Finally, this document updates and significantly expands the 644 referenced literature. 646 12. Security Considerations 648 These presented implementation methods do not have additional 649 ramifications for explicit attacks. They may be susceptible to 650 denial-of-service attacks if not otherwise secured. For example, an 651 application can open a connection and set its window size to zero, 652 denying service to any other subsequent connection between those 653 hosts. 655 TCB sharing may be susceptible to denial-of-service attacks, 656 wherever the TCB is shared, between connections in a single host, or 657 between hosts if TCB sharing is implemented within a subnet (see 658 Implications section). Some shared TCB parameters are used only to 659 create new TCBs, others are shared among the TCBs of ongoing 660 connections. New connections can join the ongoing set, e.g., to 661 optimize send window size among a set of connections to the same 662 host. 664 Attacks on parameters used only for initialization affect only the 665 transient performance of a TCP connection. For short connections, 666 the performance ramification can approach that of a denial-of- 667 service attack. E.g., if an application changes its TCB to have a 668 false and small window size, subsequent connections would experience 669 performance degradation until their window grew appropriately. 671 13. IANA Considerations 673 There are no IANA implications or requests in this document. 675 This section should be removed upon final publication as an RFC. 677 14. References 679 14.1. Normative References 681 [RFC793] Postel, Jon, "Transmission Control Protocol," Network 682 Working Group RFC-793/STD-7, ISI, Sept. 1981. 684 [RFC1191] Mogul, J., Deering, S., "Path MTU Discovery," RFC 1191, 685 Nov. 1990. 687 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 688 Requirement Levels", BCP 14, RFC 2119, March 1997. 690 [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, 691 April 1997. 693 [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU 694 Discovery," RFC 4821, Mar. 2007. 696 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., Jain, A., "TCP Fast 697 Open", RFC 7413, Dec. 2014. 699 [RFC8201] McCann, J., Deering. S., Mogul, J., Hinden, R. (Ed.), 700 "Path MTU Discovery for IP version 6," RFC 8201, Jul. 701 2017. 703 14.2. Informative References 705 [Br02] Brownlee, N. and K. Claffy, "Understanding Internet 706 Traffic Streams: Dragonflies and Tortoises", IEEE 707 Communications Magazine p110-117, 2002. 709 [Be94] Berners-Lee, T., et al., "The World-Wide Web," 710 Communications of the ACM, V37, Aug. 1994, pp. 76-82. 712 [Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for 713 Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994. 715 [Co91] Comer, D., Stevens, D., Internetworking with TCP/IP, V2, 716 Prentice-Hall, NJ, 1991. 718 [FreeBSD] FreeBSD source code, Release 2.10, http://www.freebsd.org/ 720 [Ja86] Jacobson, V., (mail to public list "tcp-ip", no archive 721 found), 1986. 723 [Du16] Dukkipati, N., Yuchung C., and Amin V., "Research 724 Impacting the Practice of Congestion Control." ACM SIGCOMM 725 CCR (editorial), on-line post, July, 2016. 727 [Hu01] Hugues, A., Touch, J., Heidemann, J., "Issues in Slow- 728 Start Restart After Idle", draft-hughes-restart-00 729 (expired), Dec., 2001. 731 [Hu12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for 732 short TCP flows," 2012 IEEE International Conference on 733 Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213. 735 [Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A 736 Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala 737 Lumpur, Malaysia, 23-27 May 2016. 739 [RFC1122] Braden, R. (ed), "Requirements for Internet Hosts -- 740 Communication Layers", RFC-1122, Oct. 1989. 742 [RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions 743 Functional Specification," RFC-1644, July 1994. 745 [RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379, 746 September 1992. 748 [RFC2663] Srisuresh, P., Holdrege, M., "IP Network Address 749 Translator (NAT) Terminology and Considerations", RFC- 750 2663, August 1999. 752 [RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's 753 Initial Window," RFC 3390, Oct. 2002. 755 [RFC7231] Fielding, R., J. Reshke, Eds., "HTTP/1.1 Semantics and 756 Content," RFC-7231, June 2014. 758 [RFC3124] Balakrishnan, H., Seshan, S., "The Congestion Manager," 759 RFC 3124, June 2001. 761 [RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion 762 Control Protocol (DCCP)," RFC 4340, Mar. 2006. 764 [RFC4960] Stewart, R., (Ed.), "Stream Control Transmission 765 Protocol," RFC4960, Sept. 2007. 767 [RFC5861] Allman, M., Paxson, V., Blanton, E., "TCP Congestion 768 Control," RFC 5861, Sept. 2009. 770 [RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication 771 Option," RFC 5925, June 2010. 773 [RFC6824] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., "TCP 774 Extensions for Multipath Operation with Multiple 775 Addresses," RFC 6824, Jan. 2013. 777 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing 778 TCP's Initial Window," RFC 6928, Apr. 2013. 780 [RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., Khasnabish, 781 B., "Mechanisms for Optimizing Link Aggregation Group 782 (LAG) and Equal-Cost Multipath (ECMP) Component Link 783 Utilization in Networks", RFC 7424, Jan. 2015 785 [RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer 786 Protocol Version 2 (HTTP/2)", RFC 7540, May 2015. 788 [RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP 789 to Support Rate-Limited Traffic", RFC 7661, Oct. 2015 791 15. Acknowledgments 793 The authors would like to thank for Praveen Balasubramanian for 794 information regarding TCB sharing in Windows, and Yuchung Cheng, 795 Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on 796 earlier versions of the draft. Earlier revisions of this work 797 received funding from a collaborative research project between the 798 University of Oslo and Huawei Technologies Co., Ltd. and were partly 799 supported by USC/ISI's Postel Center. 801 This document was prepared using 2-Word-v2.0.template.dot. 803 16. Change log 805 This section should be removed upon final publication as an RFC. 807 06: 809 - Changed to update 2140, cite it normatively, and summarize the 810 updates in a separate section 812 05: 814 - Fixed some TBDs. 816 04: 818 - Removed BCP-style recommendations and fixed some TBDs. 820 03: 822 - Updated Touch's affiliation and address information 824 02: 826 - Stated that our OS implementation overview table only covers 827 temporal sharing. 829 - Correctly reflected sharing of old_RTT in Linux in the 830 implementation overview table. 832 - Marked entries that are considered safe to share with an 833 asterisk (suggestion was to split the table) 835 - Discussed correct host identification: NATs may make IP 836 addresses the wrong input, could e.g. use HTTP cookie. 838 - Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and 839 MTU 841 - Added information about option sharing, listed options in the 842 appendix 844 Authors' Addresses 846 Joe Touch 848 Manhattan Beach, CA 90266 849 USA 851 Phone: +1 (310) 560-0334 852 Email: touch@strayalpha.com 854 Michael Welzl 855 University of Oslo 856 PO Box 1080 Blindern 857 Oslo N-0316 858 Norway 860 Phone: +47 22 85 24 20 861 Email: michawe@ifi.uio.no 862 Safiqul Islam 863 University of Oslo 864 PO Box 1080 Blindern 865 Oslo N-0316 866 Norway 868 Phone: +47 22 84 08 37 869 Email: safiquli@ifi.uio.no 871 17. Appendix A: TCB sharing history 873 T/TCP proposed using caches to maintain TCB information across 874 instances (temporal sharing), e.g., smoothed RTT, RTT variance, 875 congestion avoidance threshold, and MSS [RFC1644]. These values were 876 in addition to connection counts used by T/TCP to accelerate data 877 delivery prior to the full three-way handshake during an OPEN. The 878 goal was to aggregate TCB components where they reflect one 879 association - that of the host-pair, rather than artificially 880 separating those components by connection. 882 At least one T/TCP implementation saved the MSS and aggregated the 883 RTT parameters across multiple connections, but omitted caching the 884 congestion window information [Br94], as originally specified in 885 [RFC1379]. Some T/TCP implementations immediately updated MSS when 886 the TCP MSS header option was received [Br94], although this was not 887 addressed specifically in the concepts or functional specification 888 [RFC1379][RFC1644]. In later T/TCP implementations, RTT values were 889 updated only after a CLOSE, which does not benefit concurrent 890 sessions. 892 Temporal sharing of cached TCB data was originally implemented in 893 the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same 894 [FreeBSD]. As mentioned before, only the MSS and RTT parameters were 895 cached, as originally specified in [RFC1379]. Later discussion of 896 T/TCP suggested including congestion control parameters in this 897 cache [RFC1644]. 899 18. Appendix B: Options 901 In addition to the options that can be cached and shared, this memo 902 also lists all options for which state should *not* be kept. This 903 list is meant to avoid work duplication and should be removed upon 904 publication. 906 Obsolete (MUST NOT keep state): 908 ECHO 910 ECHO REPLY 912 PO Conn permitted 914 PO service profile 916 CC 918 CC.NEW 920 CC.ECHO 922 Alt CS req 924 Alt CS data 926 No state to keep: 928 EOL 930 NOP 932 WS 934 SACK 936 TS 938 MD5 940 TCP-AO 942 EXP1 944 EXP2 946 MUST NOT keep state: 948 Skeeter (DH exchange - might be obsolete, though) 949 Bubba (DH exchange - might really be obsolete, though) 951 Trailer CS 953 SCPS capabilities 955 S-NACK 957 Records boundaries 959 Corruption experienced 961 SNAP 963 TCP Compression 965 Quickstart response 967 UTO 969 MPTCP (can we cache when this fails?) 971 TFO success 973 MAY keep state: 975 MSS 977 TFO failure (so we don't try again, since it's optional) 979 MUST keep state: 981 TFP cookie (if TFO succeeded in the past)