idnits 2.17.1 draft-touch-tcpm-2140bis-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 224 has weird spacing: '...sthresh old_...' == Line 226 has weird spacing: '...nd_cwnd old_...' == Line 279 has weird spacing: '... MSSopt curr_...' == Line 289 has weird spacing: '...sthresh curr...' == Line 291 has weird spacing: '...nd_cwnd curr...' == (6 more instances...) == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 23, 2018) is 2035 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 1644 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 1379 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 7231 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) -- Obsolete informational reference (is this intentional?): RFC 7540 (Obsoleted by RFC 9113) Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TCPM WG J. Touch 2 Internet Draft 3 Intended status: Informational M. Welzl 4 Expires: March 2019 S. Islam 5 University of Oslo 6 September 23, 2018 8 TCP Control Block Interdependence 9 draft-touch-tcpm-2140bis-05.txt 11 Status of this Memo 13 This Internet-Draft is submitted in full conformance with the 14 provisions of BCP 78 and BCP 79. 16 This document may contain material from IETF Documents or IETF 17 Contributions published or made publicly available before November 18 10, 2008. The person(s) controlling the copyright in some of this 19 material may not have granted the IETF Trust the right to allow 20 modifications of such material outside the IETF Standards Process. 21 Without obtaining an adequate license from the person(s) controlling 22 the copyright in such materials, this document may not be modified 23 outside the IETF Standards Process, and derivative works of it may 24 not be created outside the IETF Standards Process, except to format 25 it for publication as an RFC or to translate it into languages other 26 than English. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six 34 months and may be updated, replaced, or obsoleted by other documents 35 at any time. It is inappropriate to use Internet-Drafts as 36 reference material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html 44 This Internet-Draft will expire on March 23, 2016. 46 Copyright Notice 48 Copyright (c) 2018 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with 56 respect to this document. 58 Abstract 60 This memo describes interdependent TCP control blocks, where part of 61 the TCP state is shared among similar concurrent or consecutive 62 connections. TCP state includes a combination of parameters, such as 63 connection state, current round-trip time estimates, congestion 64 control information, and process information. Most of this state is 65 maintained on a per-connection basis in the TCP Control Block (TCB), 66 but implementations can (and do) share certain TCB information 67 across connections to the same host. Such sharing is intended to 68 improve overall transient transport performance, while maintaining 69 backward-compatibility with existing implementations. The sharing 70 described herein is limited to only the TCB initialization and so 71 has no effect on the long-term behavior of TCP after a connection 72 has been established. 74 Table of Contents 76 1. Introduction...................................................3 77 2. Conventions used in this document..............................3 78 3. Terminology....................................................4 79 4. The TCP Control Block (TCB)....................................4 80 5. TCB Interdependence............................................5 81 6. An Example of Temporal Sharing.................................5 82 7. An Example of Ensemble Sharing.................................8 83 8. Compatibility Issues..........................................10 84 9. Implications..................................................12 85 10. Implementation Observations..................................13 86 11. Security Considerations......................................14 87 12. IANA Considerations..........................................15 88 13. References...................................................15 89 13.1. Normative References....................................15 90 13.2. Informative References..................................15 91 14. Acknowledgments..............................................17 92 15. Change log...................................................17 93 16. Appendix A: TCB sharing history..............................19 94 17. Appendix B: Options..........................................19 96 1. Introduction 98 TCP is a connection-oriented reliable transport protocol layered 99 over IP [RFC793]. Each TCP connection maintains state, usually in a 100 data structure called the TCP Control Block (TCB). The TCB contains 101 information about the connection state, its associated local 102 process, and feedback parameters about the connection's transmission 103 properties. As originally specified and usually implemented, most 104 TCB information is maintained on a per-connection basis. Some 105 implementations can (and now do) share certain TCB information 106 across connections to the same host. Such sharing is intended to 107 lead to better overall transient performance, especially for 108 numerous short-lived and simultaneous connections, as often used in 109 the World-Wide Web [Be94],[Br02]. 111 This document discusses TCB state sharing that affects only the TCB 112 initialization, and so has no effect on the long-term behavior of 113 TCP after a connection has been established. Path information shared 114 across SYN destination port numbers assumes that TCP segments having 115 the same host-pair experience the same path properties, irrespective 116 of TCP port numbers. The observations about TCB sharing in this 117 document apply similarly to any protocol with congestion state, 118 including SCTP [RFC4960] and DCCP [RFC4340], as well as for 119 individual subflows in Multipath TCP [RFC6824]. 121 2. Conventions used in this document 123 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 124 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 125 document are to be interpreted as described in RFC 2119 [RFC2119]. 127 In this document, these words will appear with that interpretation 128 only when in ALL CAPS. Lower case uses of these words are not to be 129 interpreted as carrying significance described in RFC 2119. 131 In this document, the characters ">>" preceding an indented line(s) 132 indicates a statement using the key words listed above. This 133 convention aids reviewers in quickly identifying or finding the 134 portions of this RFC covered by these keywords. 136 3. Terminology 138 Host - a source or sink of TCP segments associated with a single IP 139 address 141 Host-pair - a pair of hosts and their corresponding IP addresses 143 Path - an Internet path between the IP addresses of two hosts 145 4. The TCP Control Block (TCB) 147 A TCB describes the data associated with each connection, i.e., with 148 each association of a pair of applications across the network. The 149 TCB contains at least the following information [RFC793]: 151 Local process state 152 pointers to send and receive buffers 153 pointers to retransmission queue and current segment 154 pointers to Internet Protocol (IP) PCB 155 Per-connection shared state 156 macro-state 157 connection state 158 timers 159 flags 160 local and remote host numbers and ports 161 TCP option state 162 micro-state 163 send and receive window state (size*, current number) 164 round-trip time and variance 165 cong. window size (snd_cwnd)* 166 cong. window size threshold (ssthresh)* 167 max window size seen* 168 sendMSS# 169 MMS_S# 170 MMS_R# 171 PMTU# 172 round-trip time and variance# 174 The per-connection information is shown as split into macro-state 175 and micro-state, terminology borrowed from [Co91]. Macro-state 176 describes the finite state machine; we include the endpoint numbers 177 and components (timers, flags) used to help maintain that state. 178 Macro-state describes the protocol for establishing and maintaining 179 shared state about the connection. Micro-state describes the 180 protocol after a connection has been established, to maintain the 181 reliability and congestion control of the data transferred in the 182 connection. 184 We further distinguish two other classes of shared micro-state that 185 are associated more with host-pairs than with application pairs. One 186 class is clearly host-pair dependent (#, e.g., MSS, MMS, PMTU, RTT), 187 and the other is host-pair dependent in its aggregate (*, e.g., 188 congestion window information, current window sizes, etc.). 190 5. TCB Interdependence 192 There are two cases of TCB interdependence. Temporal sharing occurs 193 when the TCB of an earlier (now CLOSED) connection to a host is used 194 to initialize some parameters of a new connection to that same host, 195 i.e., in sequence. Ensemble sharing occurs when a currently active 196 connection to a host is used to initialize another (concurrent) 197 connection to that host. 199 6. An Example of Temporal Sharing 201 The TCB data cache is accessed in two ways: it is read to initialize 202 new TCBs and written when more current per-host state is available. 203 New TCBs can be initialized using context from past connections as 204 follows: 206 TEMPORAL SHARING - TCB Initialization 208 Cached TCB New TCB 209 -------------------------------------- 210 old_MMS_S old_MMS_S or not cached 212 old_MMS_R old_MMS_R or not cached 214 old_sendMSS old_sendMSS 216 old_PMTU old_PMTU 218 old_RTT old_RTT 220 old_RTTvar old_RTTvar 222 old_option (option specific) 224 old_ssthresh old_ssthresh 226 old_snd_cwnd old_snd_cwnd 228 Sections 8 and 9 discuss compatibility issues and implications of 229 sharing the specific information listed above. Section 10 gives an 230 overview of known implementations. 232 Most cached TCB values are updated when a connection closes. The 233 exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122], 234 PMTU which is updated after Path MTU Discovery 235 [RFC1191][RFC4821][RFC8201], and sendMSS, which is updated if the 236 MSS option is received in the TCP SYN header. 238 Sharing sendMSS information affects only data in the SYN of the next 239 connection, because sendMSS information is typically included in 240 most TCP SYN segments. Caching PMTU can accelerate the efficiency of 241 PMTUD, but can also result in black-holing until corrected if in 242 error. Caching MMS_R and MMS_S may be of little direct value as they 243 are reported by the local IP stack anyway. 245 The way in which other TCP option state can be shared depends on the 246 details of that option. E.g., TFO state includes the TCP Fast Open 247 Cookie [RFC7413] or, in case TFO fails, a negative TCP Fast Open 248 response. RFC 7413 states, "The client MUST cache negative responses 249 from the server in order to avoid potential connection failures. 250 Negative responses include the server not acknowledging the data in 251 the SYN, ICMP error messages, and (most importantly) no response 252 (SYN-ACK) from the server at all, i.e., connection timeout." [RFC 253 7413]. TFOinfo is cached when a connection is established. 255 Other TCP option state might not be as readily cached. E.g., TCP-AO 256 [RFC5925] success or failure between a host pair for a single SYN 257 destination port might be usefully cached. TCP-AO success or failure 258 to other SYN destination ports on that host pair is never useful to 259 cache because TCP-AO security parameters can vary per service. 261 The table below gives an overview of option-specific information 262 that can be shared. 264 TEMPORAL SHARING - Option info 266 Cached New 267 ---------------------------------------- 268 old_TFO_Cookie old_TFO_Cookie 270 old_TFO_Failure old_TFO_Failure 271 TEMPORAL SHARING - Cache Updates 273 Cached TCB Current TCB when? New Cached TCB 274 ------------------------------------------------------ 275 old_MMS_S curr_ MMS_S OPEN curr MMS_S 277 old_MMS_R curr_ MMS_R OPEN curr_MMS_R 279 old_sendMSS curr_sendMSS MSSopt curr_sendMSS 281 old_PMTU curr_PMTU PMTUD curr_PMTU 283 old_RTT curr_RTT CLOSE merge(curr,old) 285 old_RTTvar curr_RTTvar CLOSE merge(curr,old) 287 old_option curr option ESTAB (depends on option) 289 old_ssthresh curr_ssthresh CLOSE merge(curr,old) 291 old_snd_cwnd curr_snd_cwnd CLOSE merge(curr,old) 293 Caching PMTU and sendMSS is trivial; reported values are cached, and 294 the most recent values are used. The cache is updated when the MSS 295 option is received in a SYN or after PMTUD (i.e., when an ICMPv4 296 Fraqmentation Needed [RFC1191] or ICMPv6 Packet Too Big message is 297 received [RFC8201] or the equivalent is inferred, e.g. as from 298 PLPMTUD [RFC4821]), respectively, so the cache always has the most 299 recent values from any connection. For sendMSS, the cache is 300 consulted only at connection establishment and not otherwise 301 updated, which means that MSS options do not affect current 302 connections. The default sendMSS is never saved; only reported MSS 303 values update the cache, so an explicit override is required to 304 reduce the sendMSS. There is no particular benefit to caching MMS_S 305 and MMS R as these are reported by the local IP stack. 307 TCP options are copied or merged depending on the details of each 308 option, where "merge" is some function that combines the values of 309 "curr" and "old". E.g., TFO state is updated when a connection is 310 established and read before establishing a new connection. 312 RTT values are updated by a more complicated mechanism 313 [RFC1644][Ja86]. Dynamic RTT estimation requires a sequence of RTT 314 measurements. As a result, the cached RTT (and its variance) is an 315 average of its previous value with the contents of the currently 316 active TCB for that host, when a TCB is closed. RTT values are 317 updated only when a connection is closed. The method for merging old 318 and current values needs to attempt to reduce the transient for new 319 connections. 321 The updates for RTT, RTTvar and ssthresh rely on existing 322 information, i.e., old values. Should no such values exist, the 323 current values are cached instead. 325 TEMPORAL SHARING - Option info Updates 327 Cached Current when? New Cached 328 ---------------------------------------------------------------- 329 old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie 331 old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure 333 7. An Example of Ensemble Sharing 335 Sharing cached TCB data across concurrent connections requires 336 attention to the aggregate nature of some of the shared state. For 337 example, although MSS and RTT values can be shared by copying, it 338 may not be appropriate to simply copy congestion window or ssthresh 339 information; instead, the new values can be a function (f) of the 340 cumulative values and the number of connections (N). 342 ENSEMBLE SHARING - TCB Initialization 344 Cached TCB New TCB 345 -------------------------------- 346 old_MMS_S old_MMS_S 348 old_MMS_R old_MMS_R 350 old_sendMSS old_sendMSS 352 old_PMTU old_PMTU 354 old_RTT old_RTT 356 old_RTTvar old_RTTvar 358 old ssthresh sum f(old ssthresh sum, N) 360 old snd_cwnd sum f(old snd cwnd sum, N) 362 old_option (option-specific) 364 Sections 8 and 9 discuss compatibility issues and implications of 365 sharing the specific information listed above. 367 The table below gives an overview of option-specific information 368 that can be shared. 370 ENSEMBLE SHARING Option info 372 Cached New 373 ---------------------------------------- 374 old_TFO_Cookie old_TFO_Cookie 376 old_TFO_Failure old_TFO_Failure 378 ENSEMBLE SHARING - Cache Updates 380 Cached TCB Current TCB when? New Cached TCB 381 ----------------------------------------------------- 382 old_MMS_S curr_MMS_S OPEN curr_MMS_S 384 old_MMS_R curr_MMS_R OPEN curr_MMS_R 386 old_sendMSS curr_sendMSS MSSopt curr_sendMSS 388 old_PMTU curr_PMTU PMTUD curr_PMTU 389 /PLPMTUD 391 old_RTT curr_RTT update rtt_update(old,curr) 393 old_RTTvar curr_RTTvar update rtt_update(old,curr) 395 old ssthresh curr ssthresh update adjust sum as appopriate 397 old snd_cwnd curr snd_cwnd update adjust sum as appopriate 399 old_option curr option (depends) (option specific) 401 For ensemble sharing, TCB information should be cached as early as 402 possible, sometimes before a connection is closed. Otherwise, 403 opening multiple concurrent connections may not result in TCB data 404 sharing if no connection closes before others open. The amount of 405 work involved in updating the aggregate average should be minimized, 406 but the resulting value should be equivalent to having all values 407 measured within a single connection. The function "rtt_update" in 408 the ensemble sharing table indicates this operation, which occurs 409 whenever the RTT would have been updated in the individual TCP 410 connection. As a result, the cache contains the shared RTT 411 variables, which no longer need to reside in the TCB [Ja86]. 413 Congestion window size and ssthresh aggregation are more complicated 414 in the concurrent case. When there is an ensemble of connections, we 415 need to decide how that ensemble would have shared these variables, 416 in order to derive initial values for new TCBs. 418 ENSEMBLE SHARING - Option info Updates 420 Cached Current when? New Cached 421 ---------------------------------------------------------------- 422 old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie 424 old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure 426 Any assumption of this sharing can be incorrect because identical 427 endpoint address pairs may not share network paths. In current 428 implementations, new congestion windows are set at an initial value 429 of 4-10 segments [RFC3390][RFC6928], so that the sum of the current 430 windows is increased for any new connection. This can have 431 detrimental consequences where several connections share a highly 432 congested link. 434 There are several ways to initialize the congestion window in a new 435 TCB among an ensemble of current connections to a host. Current TCP 436 implementations initialize it to four segments as standard [rfc3390] 437 and 10 segments experimentally [RFC6928] and T/TCP hinted that it 438 should be initialized to the old window size [RFC1644]. In the 439 former cases, the assumption is that new connections should behave 440 as conservatively as possible. In the latter T/TCP case, no 441 accommodation is made for concurrent aggregate behavior. The 442 algorithm described in [Ba12] adjusts the initial cwnd depending on 443 the cwnd values of ongoing connections. 445 8. Compatibility Issues 447 For the congestion and current window information, the initial 448 values computed by TCB interdependence may not be consistent with 449 the long-term aggregate behavior of a set of concurrent connections 450 between the same endpoints. Under conventional TCP congestion 451 control, if a single existing connection has converged to a 452 congestion window of 40 segments, two newly joining concurrent 453 connections assume initial windows of 10 segments [RFC6928], and the 454 current connection's window doesn't decrease to accommodate this 455 additional load and connections can mutually interfere. One example 456 of this is seen on low-bandwidth, high-delay links, where concurrent 457 connections supporting Web traffic can collide because their initial 458 windows were too large, even when set at one segment. 460 The authors of [Hu12] recommend to only cache ssthresh for temporal 461 sharing when flows are long. Some studies suggest that sharing 462 ssthresh between short flows can deteriorate the performance of 463 individual connections [Hu12, Du16], although this may benefit 464 aggregate network performance. 466 Due to mechanisms like ECMP and LAG [RFC7424], TCP connections 467 sharing the same host-pair may not always share the same path. This 468 does not matter for host-specific information such as RWIN and TCP 469 option state, such as TFOinfo. When TCB information is shared across 470 different SYN destination ports, path-related information can be 471 incorrect; however, the impact of this error is potentially 472 diminished if (as discussed here) TCB sharing affects only the 473 transient event of a connection start or if TCB information is 474 shared only within connections to the same SYN destination port. In 475 case of Temporal Sharing, TCB information could also become invalid 476 over time. Because this is similar to the case when a connection 477 becomes idle, mechanisms that address idle TCP connections (e.g., 478 [RFC7661]) could also be applied to TCB cache management, especially 479 when TCP Fast Open is used [RFC7413]. 481 There may be additional considerations to the way in which TCB 482 interdependence rebalances congestion feedback among the current 483 connections, e.g., it may be appropriate to consider the impact of a 484 connection being in Fast Recovery [RFC5861] or some other similar 485 unusual feedback state, e.g., as inhibiting or affecting the 486 calculations described herein. 488 TCP is sometimes used in situations where packets of the same host- 489 pair always take the same path. Because ECMP and LAG examine TCP 490 port numbers, they may not be supported when TCP segments are 491 encapsulated, encrypted, or altered - for example, some Virtual 492 Private Networks (VPNs) are known to use proprietary UDP 493 encapsulation methods. Similarly, they cannot operate when the TCP 494 header is encrypted, e.g., when using IPsec ESP. TCB interdependence 495 among the entire set sharing the same endpoint IP addresses should 496 work without problems under these circumstances. Moreover, measures 497 to increase the probability that connections use the same path could 498 be applied: e.g., the connections could be given the same IPv6 flow 499 label. TCB interdependence can also be extended to sets of host IP 500 address pairs that share the same network path conditions, such as 501 when a group of addresses is on the same LAN (see Section 9). 503 It can be wrong to share TCB information between TCP connections on 504 the same host as identified by the IP address if an IP address is 505 assigned to a new host (e.g., IP address spinning, as is used by 506 ISPs to inhibit running servers). It can be wrong if Network Address 507 (and Port) Translation (NA(P)T) [RFC2663] or any other IP sharing 508 mechanism is used. Such mechanisms are less likely to be used with 509 IPv6. Other methods to identify a host could also be considered to 510 make correct TCB sharing more likely. Moreover, some TCB information 511 is about dominant path properties rather than the specific host. IP 512 addresses may differ, yet the relevant part of the path may be the 513 same. 515 9. Implications 517 There are several implications to incorporating TCB interdependence 518 in TCP implementations. First, it may reduce the need for 519 application-layer multiplexing for performance enhancement 520 [RFC7231]. Protocols like HTTP/2 [RFC7540] avoid connection 521 reestablishment costs by serializing or multiplexing a set of per- 522 host connections across a single TCP connection. This avoids TCP's 523 per-connection OPEN handshake and also avoids recomputing MSS, RTT, 524 and congestion windows. By avoiding the so-called, "slow-start 525 restart," performance can be optimized. TCB interdependece can 526 provide the "slow-start restart avoidance" of multiplexing, without 527 requiring a multiplexing mechanism at the application layer. 529 TCB interdependence pushes some of the TCP implementation from the 530 traditional transport layer (in the ISO model), to the network 531 layer. This acknowledges that some state is in fact per-host-pair or 532 can be per-path as indicated solely by that host-pair. Transport 533 protocols typically manage per-application-pair associations (per 534 stream), and network protocols manage per-host-pair and path 535 associations (routing). Round-trip time, MSS, and congestion 536 information could be more appropriately handled in a network-layer 537 fashion, aggregated among concurrent connections, and shared across 538 connection instances [RFC3124]. 540 An earlier version of RTT sharing suggested implementing RTT state 541 at the IP layer, rather than at the TCP layer [Ja86]. Our 542 observations are for sharing state among TCP connections, which 543 avoids some of the difficulties in an IP-layer solution. One such 544 problem is determining the associated prior outgoing packet for an 545 incoming packet, to infer RTT from the exchange. Because RTTs are 546 still determined inside the TCP layer, this is simpler than at the 547 IP layer. This is a case where information should be computed at the 548 transport layer, but could be shared at the network layer. 550 Per-host-pair associations are not the limit of these techniques. It 551 is possible that TCBs could be similarly shared between hosts on a 552 subnet or within a cluster, because the predominant path can be 553 subnet-subnet, rather than host-host. Additionally, TCB 554 interdependence can be applied to any protocol with congestion 555 state, including SCTP [RFC4960] and DCCP [RFC4340], as well as for 556 individual subflows in Multipath TCP [RFC6824]. 558 There may be other information that can be shared between concurrent 559 connections. For example, knowing that another connection has just 560 tried to expand its window size and failed, a connection may not 561 attempt to do the same for some period. The idea is that existing 562 TCP implementations infer the behavior of all competing connections, 563 including those within the same host or subnet. One possible 564 optimization is to make that implicit feedback explicit, via 565 extended information associated with the endpoint IP address and its 566 TCP implementation, rather than per-connection state in the TCB. 568 Like its initial version in 1997, this document's approach to TCB 569 interdependence focuses on sharing a set of TCBs by updating the TCB 570 state to reduce the impact of transients when connections begin or 571 end. Other mechanisms have since been proposed to continuously share 572 information between all ongoing communication (including 573 connectionless protocols), updating the congestion state during any 574 congestion-related event (e.g., timeout, loss confirmation, etc.) 575 [RFC3124]. By dealing exclusively with transients, TCB 576 interdependence is more likely to exhibit the same behavior as 577 unmodified, independent TCP connections. 579 10. Implementation Observations 581 The observation that some TCB state is host-pair specific rather 582 than application-pair dependent is not new and is a common 583 engineering decision in layered protocol implementations. A 584 discussion of sharing RTT information among protocols layered over 585 IP, including UDP and TCP, occurred in [Ja86]. Although now 586 deprecated, T/TCP was the first to propose using caches in order to 587 maintain TCB states (see Appendix A for more information). 589 The table below describes the current implementation status for some 590 TCB information in Linux kernel version 4.6, FreeBSD 10 and Windows 591 (as of October 2016). In the table, "shared" only refers to temporal 592 sharing. 594 TCB data Status 595 ----------------------------------------------------------- 596 old MMS_S Not shared 598 old MMS_R Not shared 600 old_sendMSS Cached and shared in Linux (MSS) 602 old PMTU Cached and shared in FreeBSD and Windows (PMTU) 604 old_RTT Cached and shared in FreeBSD and Linux 606 old_RTTvar Cached and shared in FreeBSD 608 old TFOinfo Cached and shared in Linux and Windows 610 old_snd_cwnd Not shared 612 old_ssthresh Cached and shared in FreeBSD and Linux: 613 FreeBSD: arithmetic 614 mean of ssthresh and previous value if 615 a previous value exists; 616 Linux: depending on state, 617 max(cwnd/2, ssthresh) in most cases 619 11. Security Considerations 621 These presented implementation methods do not have additional 622 ramifications for explicit attacks. They may be susceptible to 623 denial-of-service attacks if not otherwise secured. For example, an 624 application can open a connection and set its window size to zero, 625 denying service to any other subsequent connection between those 626 hosts. 628 TCB sharing may be susceptible to denial-of-service attacks, 629 wherever the TCB is shared, between connections in a single host, or 630 between hosts if TCB sharing is implemented within a subnet (see 631 Implications section). Some shared TCB parameters are used only to 632 create new TCBs, others are shared among the TCBs of ongoing 633 connections. New connections can join the ongoing set, e.g., to 634 optimize send window size among a set of connections to the same 635 host. 637 Attacks on parameters used only for initialization affect only the 638 transient performance of a TCP connection. For short connections, 639 the performance ramification can approach that of a denial-of- 640 service attack. E.g., if an application changes its TCB to have a 641 false and small window size, subsequent connections would experience 642 performance degradation until their window grew appropriately. 644 12. IANA Considerations 646 There are no IANA implications or requests in this document. 648 This section should be removed upon final publication as an RFC. 650 13. References 652 13.1. Normative References 654 [RFC793] Postel, Jon, "Transmission Control Protocol," Network 655 Working Group RFC-793/STD-7, ISI, Sept. 1981. 657 [RFC1191] Mogul, J., Deering, S., "Path MTU Discovery," RFC 1191, 658 Nov. 1990. 660 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 661 Requirement Levels", BCP 14, RFC 2119, March 1997. 663 [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU 664 Discovery," RFC 4821, Mar. 2007. 666 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., Jain, A., "TCP Fast 667 Open", RFC 7413, Dec. 2014. 669 [RFC8201] McCann, J., Deering. S., Mogul, J., Hinden, R. (Ed.), 670 "Path MTU Discovery for IP version 6," RFC 8201, Jul. 671 2017. 673 13.2. Informative References 675 [Br02] Brownlee, N. and K. Claffy, "Understanding Internet 676 Traffic Streams: Dragonflies and Tortoises", IEEE 677 Communications Magazine p110-117, 2002. 679 [Be94] Berners-Lee, T., et al., "The World-Wide Web," 680 Communications of the ACM, V37, Aug. 1994, pp. 76-82. 682 [Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for 683 Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994. 685 [Co91] Comer, D., Stevens, D., Internetworking with TCP/IP, V2, 686 Prentice-Hall, NJ, 1991. 688 [FreeBSD] FreeBSD source code, Release 2.10, http://www.freebsd.org/ 690 [Ja86] Jacobson, V., (mail to public list "tcp-ip", no archive 691 found), 1986. 693 [Du16] Dukkipati, N., Yuchung C., and Amin V., "Research 694 Impacting the Practice of Congestion Control." ACM SIGCOMM 695 CCR (editorial). 697 [Hu12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for 698 short TCP flows," 2012 IEEE International Conference on 699 Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213. 701 [Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A 702 Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala 703 Lumpur, Malaysia, 23-27 May 2016. 705 [RFC1122] Braden, R. (ed), "Requirements for Internet Hosts -- 706 Communication Layers", RFC-1122, Oct. 1989. 708 [RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions 709 Functional Specification," RFC-1644, July 1994. 711 [RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379, 712 September 1992. 714 [RFC2663] Srisuresh, P., Holdrege, M., "IP Network Address 715 Translator (NAT) Terminology and Considerations", RFC- 716 2663, August 1999. 718 [RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's 719 Initial Window," RFC 3390, Oct. 2002. 721 [RFC7231] Fielding, R., J. Reshke, Eds., "HTTP/1.1 Semantics and 722 Content," RFC-7231, June 2014. 724 [RFC3124] Balakrishnan, H., Seshan, S., "The Congestion Manager," 725 RFC 3124, June 2001. 727 [RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion 728 Control Protocol (DCCP)," RFC 4340, Mar. 2006. 730 [RFC4960] Stewart, R., (Ed.), "Stream Control Transmission 731 Protocol," RFC4960, Sept. 2007. 733 [RFC5861] Allman, M., Paxson, V., Blanton, E., "TCP Congestion 734 Control," RFC 5861, Sept. 2009. 736 [RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication 737 Option," RFC 5925, June 2010. 739 [RFC6824] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., "TCP 740 Extensions for Multipath Operation with Multiple 741 Addresses," RFC 6824, Jan. 2013. 743 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing 744 TCP's Initial Window," RFC 6928, Apr. 2013. 746 [RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., Khasnabish, 747 B., "Mechanisms for Optimizing Link Aggregation Group 748 (LAG) and Equal-Cost Multipath (ECMP) Component Link 749 Utilization in Networks", RFC 7424, Jan. 2015 751 [RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer 752 Protocol Version 2 (HTTP/2)", RFC 7540, May 2015. 754 [RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP 755 to Support Rate-Limited Traffic", RFC 7661, Oct. 2015 757 14. Acknowledgments 759 The authors would like to thank for Praveen Balasubramanian for 760 information regarding TCB sharing in Windows, and Yuchung Cheng, 761 Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on 762 earlier versions of the draft. Earlier revisions of this work 763 received funding from a collaborative research project between the 764 University of Oslo and Huawei Technologies Co., Ltd. and were partly 765 supported by USC/ISI's Postel Center. 767 This document was prepared using 2-Word-v2.0.template.dot. 769 15. Change log 771 05: 773 - Fixed some TBDs. 775 03: 777 - Updated Touch's affiliation and address information 779 02: 781 - Stated that our OS implementation overview table only covers 782 temporal sharing. 784 - Correctly reflected sharing of old_RTT in Linux in the 785 implementation overview table. 787 - Marked entries that are considered safe to share with an 788 asterisk (suggestion was to split the table) 790 - Discussed correct host identification: NATs may make IP 791 addresses the wrong input, could e.g. use HTTP cookie. 793 - Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and 794 MTU 796 - Added information about option sharing, listed options in the 797 appendix 799 Authors' Addresses 801 Joe Touch 803 Manhattan Beach, CA 90266 804 USA 806 Phone: +1 (310) 560-0334 807 Email: touch@strayalpha.com 809 Michael Welzl 810 University of Oslo 811 PO Box 1080 Blindern 812 Oslo N-0316 813 Norway 815 Phone: +47 22 85 24 20 816 Email: michawe@ifi.uio.no 817 Safiqul Islam 818 University of Oslo 819 PO Box 1080 Blindern 820 Oslo N-0316 821 Norway 823 Phone: +47 22 84 08 37 824 Email: safiquli@ifi.uio.no 826 16. Appendix A: TCB sharing history 828 T/TCP proposed using caches to maintain TCB information across 829 instances (temporal sharing), e.g., smoothed RTT, RTT variance, 830 congestion avoidance threshold, and MSS [RFC1644]. These values were 831 in addition to connection counts used by T/TCP to accelerate data 832 delivery prior to the full three-way handshake during an OPEN. The 833 goal was to aggregate TCB components where they reflect one 834 association - that of the host-pair, rather than artificially 835 separating those components by connection. 837 At least one T/TCP implementation saved the MSS and aggregated the 838 RTT parameters across multiple connections, but omitted caching the 839 congestion window information [Br94], as originally specified in 840 [RFC1379]. Some T/TCP implementations immediately updated MSS when 841 the TCP MSS header option was received [Br94], although this was not 842 addressed specifically in the concepts or functional specification 843 [RFC1379][RFC1644]. In later T/TCP implementations, RTT values were 844 updated only after a CLOSE, which does not benefit concurrent 845 sessions. 847 Temporal sharing of cached TCB data was originally implemented in 848 the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same 849 [FreeBSD]. As mentioned before, only the MSS and RTT parameters were 850 cached, as originally specified in [RFC1379]. Later discussion of 851 T/TCP suggested including congestion control parameters in this 852 cache [RFC1644]. 854 17. Appendix B: Options 856 In addition to the options that can be cached and shared, this memo 857 also lists all options for which state should *not* be kept. This 858 list is meant to avoid work duplication and should be removed upon 859 publication. 861 Obsolete (MUST NOT keep state): 863 ECHO 865 ECHO REPLY 867 PO Conn permitted 869 PO service profile 871 CC 873 CC.NEW 875 CC.ECHO 877 Alt CS req 879 Alt CS data 881 No state to keep: 883 EOL 885 NOP 887 WS 889 SACK 891 TS 893 MD5 895 TCP-AO 897 EXP1 899 EXP2 900 MUST NOT keep state: 902 Skeeter (DH exchange - might be obsolete, though) 904 Bubba (DH exchange - might really be obsolete, though) 906 Trailer CS 908 SCPS capabilities 910 S-NACK 912 Records boundaries 914 Corruption experienced 916 SNAP 918 TCP Compression 920 Quickstart response 922 UTO 924 MPTCP (can we cache when this fails?) 926 TFO success 928 MAY keep state: 930 MSS 932 TFO failure (so we don't try again, since it's optional) 934 MUST keep state: 936 TFP cookie (if TFO succeeded in the past)