idnits 2.17.1 draft-touch-tcpm-2140bis-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 223 has weird spacing: '... varies old_o...' == Line 225 has weird spacing: '...sthresh old_...' == Line 227 has weird spacing: '...nd_cwnd old_...' == Line 281 has weird spacing: '... MSSopt curr_...' == Line 289 has weird spacing: '... varies old_o...' == (6 more instances...) == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 19, 2018) is 2260 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 1644 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 1379 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 7231 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) -- Obsolete informational reference (is this intentional?): RFC 7540 (Obsoleted by RFC 9113) Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TCPM WG J. Touch 2 Internet Draft 3 Intended status: Informational M. Welzl 4 Expires: July 2018 S. Islam 5 University of Oslo 6 J. You 7 Huawei 8 January 19, 2018 10 TCP Control Block Interdependence 11 draft-touch-tcpm-2140bis-03.txt 13 Status of this Memo 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 This document may contain material from IETF Documents or IETF 19 Contributions published or made publicly available before November 20 10, 2008. The person(s) controlling the copyright in some of this 21 material may not have granted the IETF Trust the right to allow 22 modifications of such material outside the IETF Standards Process. 23 Without obtaining an adequate license from the person(s) controlling 24 the copyright in such materials, this document may not be modified 25 outside the IETF Standards Process, and derivative works of it may 26 not be created outside the IETF Standards Process, except to format 27 it for publication as an RFC or to translate it into languages other 28 than English. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF), its areas, and its working groups. Note that 32 other groups may also distribute working documents as Internet- 33 Drafts. 35 Internet-Drafts are draft documents valid for a maximum of six 36 months and may be updated, replaced, or obsoleted by other documents 37 at any time. It is inappropriate to use Internet-Drafts as 38 reference material or to cite them other than as "work in progress." 40 The list of current Internet-Drafts can be accessed at 41 http://www.ietf.org/ietf/1id-abstracts.txt 43 The list of Internet-Draft Shadow Directories can be accessed at 44 http://www.ietf.org/shadow.html 45 This Internet-Draft will expire on July 19, 2018. 47 Copyright Notice 49 Copyright (c) 2018 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with 57 respect to this document. 59 Abstract 61 This memo describes interdependent TCP control blocks, where part of 62 the TCP state is shared among similar concurrent or consecutive 63 connections. TCP state includes a combination of parameters, such as 64 connection state, current round-trip time estimates, congestion 65 control information, and process information. Most of this state is 66 maintained on a per-connection basis in the TCP Control Block (TCB), 67 but implementations can (and do) share certain TCB information 68 across connections to the same host. Such sharing is intended to 69 improve overall transient transport performance, while maintaining 70 backward-compatibility with existing implementations. The sharing 71 described herein is limited to only the TCB initialization and so 72 has no effect on the long-term behavior of TCP after a connection 73 has been established. 75 Table of Contents 77 1. Introduction...................................................3 78 2. Conventions used in this document..............................3 79 3. Terminology....................................................4 80 4. The TCP Control Block (TCB)....................................4 81 5. TCB Interdependence............................................5 82 6. An Example of Temporal Sharing.................................5 83 7. An Example of Ensemble Sharing.................................8 84 8. Compatibility Issues..........................................10 85 9. Implications..................................................12 86 10. Implementation Observations..................................14 87 11. Security Considerations......................................15 88 12. IANA Considerations..........................................16 89 13. References...................................................17 90 13.1. Normative References....................................17 91 13.2. Informative References..................................17 92 14. Acknowledgments..............................................19 93 15. Change log...................................................19 94 16. Appendix A: TCB sharing history..............................21 95 17. Appendix B: Options..........................................21 97 1. Introduction 99 TCP is a connection-oriented reliable transport protocol layered 100 over IP [RFC793]. Each TCP connection maintains state, usually in a 101 data structure called the TCP Control Block (TCB). The TCB contains 102 information about the connection state, its associated local 103 process, and feedback parameters about the connection's transmission 104 properties. As originally specified and usually implemented, most 105 TCB information is maintained on a per-connection basis. Some 106 implementations can (and now do) share certain TCB information 107 across connections to the same host. Such sharing is intended to 108 lead to better overall transient performance, especially for 109 numerous short-lived and simultaneous connections, as often used in 110 the World-Wide Web [Be94],[Br02]. 112 This document discusses TCB state sharing that affects only the TCB 113 initialization, and so has no effect on the long-term behavior of 114 TCP after a connection has been established. Path information shared 115 across SYN destination port numbers assumes that TCP segments having 116 the same host-pair experience the same path properties, irrespective 117 of TCP port numbers. The observations about TCB sharing in this 118 document apply similarly to any protocol with congestion state, 119 including SCTP [RFC4960] and DCCP [RFC4340], as well as for 120 individual subflows in Multipath TCP [RFC6824]. 122 2. Conventions used in this document 124 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 125 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 126 document are to be interpreted as described in RFC 2119 [RFC2119]. 128 In this document, these words will appear with that interpretation 129 only when in ALL CAPS. Lower case uses of these words are not to be 130 interpreted as carrying significance described in RFC 2119. 132 In this document, the characters ">>" preceding an indented line(s) 133 indicates a statement using the key words listed above. This 134 convention aids reviewers in quickly identifying or finding the 135 portions of this RFC covered by these keywords. 137 3. Terminology 139 Host - a source or sink of TCP segments associated with a single IP 140 address 142 Host-pair - a pair of hosts and their corresponding IP addresses 144 Path - an Internet path between the IP addresses of two hosts 146 4. The TCP Control Block (TCB) 148 A TCB describes the data associated with each connection, i.e., with 149 each association of a pair of applications across the network. The 150 TCB contains at least the following information [RFC793]: 152 Local process state 153 pointers to send and receive buffers 154 pointers to retransmission queue and current segment 155 pointers to Internet Protocol (IP) PCB 156 Per-connection shared state 157 macro-state 158 connection state 159 timers 160 flags 161 local and remote host numbers and ports 162 TCP option state 163 micro-state 164 send and receive window state (size*, current number) 165 round-trip time and variance 166 cong. window size (snd_cwnd)* 167 cong. window size threshold (ssthresh)* 168 max window size seen* 169 sendMSS# 170 MMS_S# 171 MMS_R# 172 PMTU# 173 round-trip time and variance# 175 The per-connection information is shown as split into macro-state 176 and micro-state, terminology borrowed from [Co91]. Macro-state 177 describes the finite state machine; we include the endpoint numbers 178 and components (timers, flags) used to help maintain that state. 179 Macro-state describes the protocol for establishing and maintaining 180 shared state about the connection. Micro-state describes the 181 protocol after a connection has been established, to maintain the 182 reliability and congestion control of the data transferred in the 183 connection. 185 We further distinguish two other classes of shared micro-state that 186 are associated more with host-pairs than with application pairs. One 187 class is clearly host-pair dependent (#, e.g., MSS, MMS, PMTU, RTT), 188 and the other is host-pair dependent in its aggregate (*, e.g., 189 congestion window information, current window sizes, etc.). 191 5. TCB Interdependence 193 There are two cases of TCB interdependence. Temporal sharing occurs 194 when the TCB of an earlier (now CLOSED) connection to a host is used 195 to initialize some parameters of a new connection to that same host, 196 i.e., in sequence. Ensemble sharing occurs when a currently active 197 connection to a host is used to initialize another (concurrent) 198 connection to that host. 200 6. An Example of Temporal Sharing 202 The TCB data cache is accessed in two ways: it is read to initialize 203 new TCBs and written when more current per-host state is available. 204 New TCBs are initialized using context from past connections as 205 follows: 207 TEMPORAL SHARING - TCB Initialization 209 Safe? Cached TCB New TCB 210 ---------------------------------------------- 211 yes old_MMS_S old_MMS_S or not cached 213 yes old_MMS_R old_MMS_R or not cached 215 yes old_sendMSS old_sendMSS 217 yes old_PMTU old_PMTU 219 TBD old_RTT old_RTT 221 TBD old_RTTvar old_RTTvar 223 varies old_option (option specific) 225 TBD old_ssthresh old_ssthresh 227 TBD old_snd_cwnd old_snd_cwnd 229 Table entries indicate which are considered to be safe to share 230 temporally. The other entries are discussed in section 8. 232 Most cached TCB values are updated when a connection closes. The 233 exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122], 234 PMTU which is updated after Path MTU Discovery 235 [RFC1191][RFC4821][RFC8201], and sendMSS, which is updated if the 236 MSS option is received in the TCP SYN header. 238 Sharing sendMSS information affects only data in the SYN of the next 239 connection, because sendMSS information is typically included in 240 most TCP SYN segments. Caching PMTU can accelerate the efficiency of 241 PMTUD, but can also result in black-holing until corrected if in 242 error. Caching MMS_R and MMS_S may be of little direct value as they 243 are reported by the local IP stack anyway. 245 [TBD - complete this section with details for TFO and other options 246 whose state may, must, or must not be shared] The way in which other 247 TCP option state can be shared depends on the details of that 248 option. E.g., TFO state includes the TCP Fast Open Cookie [RFC7413] 249 or, in case TFO fails, a negative TCP Fast Open response (from [RFC 250 7413]: "The client MUST cache negative responses from the server in 251 order to avoid potential connection failures. Negative responses 252 include the server not acknowledging the data in the SYN, ICMP error 253 messages, and (most importantly) no response (SYN-ACK) from the 254 server at all, i.e., connection timeout."). TFOinfo is cached when a 255 connection is established. 257 Other TCP option state might not be as readily cached. E.g., TCP-AO 258 [RFC5925] success or failure between a host pair for a single SYN 259 destination port might be usefully cached. TCP-AO success or failure 260 to other SYN destination ports on that host pair is never useful to 261 cache because TCP-AO security parameters can vary per service. 263 The table below gives an overview of option-specific information 264 that is considered safe to share. 266 TEMPORAL SHARING - Option info 268 Cached New 269 ---------------------------------------- 270 old_TFO_Cookie old_TFO_Cookie 272 old_TFO_Failure old_TFO_Failure 273 TEMPORAL SHARING - Cache Updates 275 Safe? Cached TCB Current TCB when? New Cached TCB 276 ----------------------------------------------------------------- 277 yes old_MMS_S curr_ MMS_S OPEN curr MMS_S 279 yes old_MMS_R curr_ MMS_R OPEN curr_MMS_R 281 yes old_sendMSS curr_sendMSS MSSopt curr_sendMSS 283 yes old_PMTU curr_PMTU PMTUD curr_PMTU 285 TBD old_RTT curr_RTT CLOSE merge(curr,old) 287 TBD old_RTTvar curr_RTTvar CLOSE merge(curr,old) 289 varies old_option curr option ESTAB (depends on option) 291 TBD old_ssthresh curr_ssthresh CLOSE merge(curr,old) 293 TBD old_snd_cwnd curr_snd_cwnd CLOSE merge(curr,old) 295 Caching PMTU and sendMSS is trivial; reported values are cached, and 296 the most recent values are used. The cache is updated when the MSS 297 option is received in a SYN or after PMTUD (i.e., when an ICMPv4 298 Fraqmentation Needed [RFC1191] or ICMPv6 Packet Too Big message is 299 received [RFC8201] or the equivalent is inferred, e.g. as from 300 PLPMTUD [RFC4821]), respectively, so the cache always has the most 301 recent values from any connection. For sendMSS, the cache is 302 consulted only at connection establishment and not otherwise 303 updated, which means that MSS options do not affect current 304 connections. The default sendMSS is never saved; only reported MSS 305 values update the cache, so an explicit override is required to 306 reduce the sendMSS. There is no particular benefit to caching MMS_S 307 and MMS_R as these are reported by the local IP stack. 309 TCP options are copied or merged depending on the details of each 310 option. E.g., TFO state is updated when a connection is established 311 and read before establishing a new connection. 313 RTT values are updated by a more complicated mechanism 314 [RFC1644][Ja86]. Dynamic RTT estimation requires a sequence of RTT 315 measurements. As a result, the cached RTT (and its variance) is an 316 average of its previous value with the contents of the currently 317 active TCB for that host, when a TCB is closed. RTT values are 318 updated only when a connection is closed. The method for merging old 319 and current values needs to attempt to reduce the transient for new 320 connections. [THESE MERGE FUNCTIONS NEED TO BE SPECIFIED, 321 considering e.g. [DM16] - TBD]. 323 The updates for RTT, RTTvar and ssthresh rely on existing 324 information, i.e., old values. Should no such values exist, the 325 current values are cached instead. 327 TEMPORAL SHARING - Option info Updates 329 Cached Current when? New Cached 330 ---------------------------------------------------------------- 331 old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie 333 old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure 335 7. An Example of Ensemble Sharing 337 Sharing cached TCB data across concurrent connections requires 338 attention to the aggregate nature of some of the shared state. For 339 example, although MSS and RTT values can be shared by copying, it 340 may not be appropriate to copy congestion window or ssthresh 341 information (see section 8 for a discussion of congestion window or 342 ssthresh sharing). 344 ENSEMBLE SHARING - TCB Initialization 346 Safe? Cached TCB New TCB 347 ----------------------------------------- 348 yes old_MMS_S old_MMS_S 350 yes old_MMS_R old_MMS_R 352 yes old_sendMSS old_sendMSS 354 yes old_PMTU old_PMTU TBD 355 old_RTT old_RTT 357 TBD old_RTTvar old_RTTvar 359 TBD old_option (option-specific) 361 Table entries indicate which are considered to be safe to share 362 across an ensemble. The other entries are discussed in section 8. 364 The table below gives an overview of option-specific information 365 that is considered safe to share. 367 ENSEMBLE SHARING - Option info 369 Cached New 370 ---------------------------------------- 371 old_TFO_Cookie old_TFO_Cookie 373 old_TFO_Failure old_TFO_Failure 375 ENSEMBLE SHARING - Cache Updates 377 Safe? Cached TCB Current TCB when? New Cached TCB 378 -------------------------------------------------------------- 379 yes old_MMS_S curr_MMS_S OPEN curr_MMS_S 381 yes old_MMS_R curr_MMS_R OPEN curr_MMS_R 383 yes old_sendMSS curr_sendMSS MSSopt curr_sendMSS 385 yes old_PMTU curr_PMTU PMTUD curr_PMTU 386 /PLPMTUD 388 TBD old_RTT curr_RTT update rtt_update(old,cur) 390 TBD old_RTTvar curr_RTTvar update rtt_update(old,cur) 392 varies old_option curr option (depends) (option specific) 394 For ensemble sharing, TCB information should be cached as early as 395 possible, sometimes before a connection is closed. Otherwise, 396 opening multiple concurrent connections may not result in TCB data 397 sharing if no connection closes before others open. The amount of 398 work involved in updating the aggregate average should be minimized, 399 but the resulting value should be equivalent to having all values 400 measured within a single connection. The function "rtt_update" in 401 the ensemble sharing table indicates this operation, which occurs 402 whenever the RTT would have been updated in the individual TCP 403 connection. As a result, the cache contains the shared RTT 404 variables, which no longer need to reside in the TCB [Ja86]. 406 Congestion window size and ssthresh aggregation are more complicated 407 in the concurrent case. When there is an ensemble of connections, we 408 need to decide how that ensemble would have shared these variables, 409 in order to derive initial values for new TCBs. 411 ENSEMBLE SHARING - Option info Updates 413 Cached Current when? New Cached 414 ---------------------------------------------------------------- 415 old_TFO_Cookie old_TFO_Cookie ESTAB old_TFO_Cookie 417 old_TFO_Failure old_TFO_Failure ESTAB old_TFO_Failure 419 Any assumption of this sharing can be incorrect, including this one, 420 because identical endpoint address pairs may not share network 421 paths. In current implementations, new congestion windows are set at 422 an initial value of 4-10 segments [RFC3390][RFC6928], so that the 423 sum of the current windows is increased for any new connection. This 424 can have detrimental consequences where several connections share a 425 highly congested link. 427 There are several ways to initialize the congestion window in a new 428 TCB among an ensemble of current connections to a host, as shown 429 below. Current TCP implementations initialize it to four segments as 430 standard [rfc3390] and 10 segments experimentally [RFC6928] and 431 T/TCP hinted that it should be initialized to the old window size 432 [RFC1644]. In the former cases, the assumption is that new 433 connections should behave as conservatively as possible. In the 434 latter T/TCP case, no accommodation is made for concurrent aggregate 435 behavior. 437 In either case, the sum of window sizes can increase, rather than 438 remain constant. A different approach is to give each pending 439 connection its "fair share" of the available congestion window, and 440 let the connections balance from there. The assumption we make here 441 is that new connections are implicit requests for an equal share of 442 available link bandwidth, which should be granted at the expense of 443 current connections. [TBD - a new method for safe congestion sharing 444 will be described] 446 8. Compatibility Issues 448 For the congestion and current window information, the initial 449 values computed by TCB interdependence may not be consistent with 450 the long-term aggregate behavior of a set of concurrent connections 451 between the same endpoints. Under conventional TCP congestion 452 control, if a single existing connection has converged to a 453 congestion window of 40 segments, two newly joining concurrent 454 connections assume initial windows of 10 segments [RFC6928], and the 455 current connection's window doesn't decrease to accommodate this 456 additional load and connections can mutually interfere. One example 457 of this is seen on low-bandwidth, high-delay links, where concurrent 458 connections supporting Web traffic can collide because their initial 459 windows were too large, even when set at one segment. 461 [TBD - this paragraph needs to be revised based on new 462 recommendations] Under TCB interdependence, all three connections 463 could change to use a congestion window of 12 (rounded down to an 464 even number from 13.33, i.e., 40/3). This would include both 465 increasing the initial window of the new connections (vs. current 466 recommendations [RFC6928]) and decreasing the congestion window of 467 the current connection (from 40 down to 12). This gives the new 468 connections a larger initial window than allowed by [RFC6928], but 469 maintains the aggregate. Depending on whether the previous 470 connections were in steady-state, this can result in more bursty 471 behavior, e.g., when previous connections are idle and new 472 connections commence with a large amount of available data to 473 transmit. Additionally, reducing the congestion window of an 474 existing connection needs to account for the number of packets that 475 are already in flight. 477 Because this proposal attempts to anticipate the aggregate steady- 478 state values of TCB state among a group or over time, it should 479 avoid the transient effects of new connections. In addition, because 480 it considers the ensemble and temporal properties of those 481 aggregates, it should also prevent the transients of short-lived or 482 multiple concurrent connections from adversely affecting the overall 483 network performance. There have been ongoing analysis and 484 experiments to validate these assumptions. For example, [Ph12] 485 recommends to only cache ssthresh for temporal sharing when flows 486 are long. Sharing ssthresh between short flows can deteriorate the 487 overall performance of individual connections[Ph12, Nd16], although 488 this may benefit overall network performance. [TBD - the details of 489 this issue need to be summarized and clarified herein]. 491 [TBD - placeholder for corresponding RTT discussion] 493 Due to mechanisms like ECMP and LAG [RFC7424], TCP connections 494 sharing the same host-pair may not always share the same path. This 495 does not matter for host-specific information such as RWIN and TCP 496 option state, such as TFOinfo. When TCB information is shared across 497 different SYN destination ports, path-related information can be 498 incorrect; however, the impact of this error is potentially 499 diminished if (as discussed here) TCB sharing affects only the 500 transient event of a connection start or if TCB information is 501 shared only within connections to the same SYN destination port. In 502 case of Temporal Sharing, TCB information could also become invalid 503 over time. Because this is similar to the case when a connection 504 becomes idle, mechanisms that address idle TCP connections (e.g., 505 [RFC7661]) could also be applied to TCB cache management. 507 There may be additional considerations to the way in which TCB 508 interdependence rebalances congestion feedback among the current 509 connections, e.g., it may be appropriate to consider the impact of a 510 connection being in Fast Recovery [RFC5861] or some other similar 511 unusual feedback state, e.g., as inhibiting or affecting the 512 calculations described herein. 514 TCP is sometimes used in situations where packets of the same host- 515 pair always take the same path. Because ECMP and LAG examine TCP 516 port numbers, they may not be supported when TCP segments are 517 encapsulated, encrypted, or altered - for example, some Virtual 518 Private Networks (VPNs) are known to use proprietary UDP 519 encapsulation methods. Similarly, they cannot operate when the TCP 520 header is encrypted, e.g., when using IPsec ESP. TCB interdependence 521 among the entire set sharing the same endpoint IP addresses should 522 work without problems under these circumstances. Moreover, measures 523 to increase the probability that connections use the same path could 524 be applied: e.g., the connections could be given the same IPv6 flow 525 label. TCB interdependence can also be extended to sets of host IP 526 address pairs that share the same network path conditions, such as 527 when a group of addresses is on the same LAN (see Section 9). 529 It can be wrong to share TCB information between TCP connections on 530 the same host as identified by the IP address if an IP address is 531 assigned to a new host (e.g., IP address spinning, as is used by 532 ISPs to inhibit running servers). It can be wrong if Network Address 533 (and Port) Translation (NA(P)T) [RFC2663] or any other IP sharing 534 mechanism is used. Such mechanisms are less likely to be used with 535 IPv6. Other methods to identify a host could also be considered to 536 make correct TCB sharing more likely. Moreover, some TCB information 537 is about dominant path properties rather than the specific host. IP 538 addresses may differ, yet the relevant part of the path may be the 539 same. 541 9. Implications 543 There are several implications to incorporating TCB interdependence 544 in TCP implementations. First, it may reduce the need for 545 application-layer multiplexing for performance enhancement 546 [RFC7231]. Protocols like HTTP/2 [RFC7540] avoid connection 547 reestablishment costs by serializing or multiplexing a set of per- 548 host connections across a single TCP connection. This avoids TCP's 549 per-connection OPEN handshake and also avoids recomputing MSS, RTT, 550 and congestion windows. By avoiding the so-called, "slow-start 551 restart," performance can be optimized. TCB interdependece can 552 provide the "slow-start restart avoidance" of multiplexing, without 553 requiring a multiplexing mechanism at the application layer. 555 TCB interdependence pushes some of the TCP implementation from the 556 traditional transport layer (in the ISO model), to the network 557 layer. This acknowledges that some state is in fact per-host-pair or 558 can be per-path as indicated solely by that host-pair. Transport 559 protocols typically manage per-application-pair associations (per 560 stream), and network protocols manage per-host-pair and path 561 associations (routing). Round-trip time, MSS, and congestion 562 information could be more appropriately handled in a network-layer 563 fashion, aggregated among concurrent connections, and shared across 564 connection instances [RFC3124]. 566 An earlier version of RTT sharing suggested implementing RTT state 567 at the IP layer, rather than at the TCP layer [Ja86]. Our 568 observations are for sharing state among TCP connections, which 569 avoids some of the difficulties in an IP-layer solution. One such 570 problem is determining the associated prior outgoing packet for an 571 incoming packet, to infer RTT from the exchange. Because RTTs are 572 still determined inside the TCP layer, this is simpler than at the 573 IP layer. This is a case where information should be computed at the 574 transport layer, but could be shared at the network layer. 576 Per-host-pair associations are not the limit of these techniques. It 577 is possible that TCBs could be similarly shared between hosts on a 578 subnet or within a cluster, because the predominant path can be 579 subnet-subnet, rather than host-host. Additionally, TCB 580 interdependence can be applied to any protocol with congestion 581 state, including SCTP [RFC4960] and DCCP [RFC4340], as well as for 582 individual subflows in Multipath TCP [RFC6824]. 584 There may be other information that can be shared between concurrent 585 connections. For example, knowing that another connection has just 586 tried to expand its window size and failed, a connection may not 587 attempt to do the same for some period. The idea is that existing 588 TCP implementations infer the behavior of all competing connections, 589 including those within the same host or subnet. One possible 590 optimization is to make that implicit feedback explicit, via 591 extended information associated with the endpoint IP address and its 592 TCP implementation, rather than per-connection state in the TCB. 594 Like its initial version in 1997, this document's approach to TCB 595 interdependence focuses on sharing a set of TCBs by updating the TCB 596 state to reduce the impact of transients when connections begin or 597 end. Other mechanisms have since been proposed to continuously share 598 information between all ongoing communication (including 599 connectionless protocols), updating the congestion state during any 600 congestion-related event (e.g., timeout, loss confirmation, etc.) 601 [RFC3124]. By dealing exclusively with transients, TCB 602 interdependence is more likely to exhibit the same behavior as 603 unmodified, independent TCP connections. 605 10. Implementation Observations 607 The observation that some TCB state is host-pair specific rather 608 than application-pair dependent is not new and is a common 609 engineering decision in layered protocol implementations. A 610 discussion of sharing RTT information among protocols layered over 611 IP, including UDP and TCP, occurred in [Ja86]. Although now 612 deprecated, T/TCP was the first to propose using caches in order to 613 maintain TCB states (see Appendix A for more information). 615 The table below describes the current implementation status for some 616 TCB information in Linux kernel version 4.6, FreeBSD 10 and Windows 617 (as of October 2016). In the table, "shared" only refers to temporal 618 sharing. 620 TCB data Status 621 ----------------------------------------------------------- 622 old MMS_S Not shared 624 old MMS_R Not shared 626 old_sendMSS Cached and shared in Linux (MSS) 628 old PMTU Cached and shared in FreeBSD and Windows (PMTU) 630 old_RTT Cached and shared in FreeBSD and Linux 632 old_RTTvar Cached and shared in FreeBSD 634 old TFOinfo Cached and shared in Linux and Windows 636 old_snd_cwnd Not shared 638 old_ssthresh Cached and shared in FreeBSD and Linux: 639 FreeBSD: arithmetic 640 mean of ssthresh and previous value if 641 a previous value exists; 642 Linux: depending on state, 643 max(cwnd/2, ssthresh) in most cases 645 11. Security Considerations 647 These suggested implementation enhancements do not have additional 648 ramifications for explicit attacks. These enhancements may be 649 susceptible to denial-of-service attacks if not otherwise secured. 650 For example, an application can open a connection and set its window 651 size to zero, denying service to any other subsequent connection 652 between those hosts. 654 TCB sharing may be susceptible to denial-of-service attacks, 655 wherever the TCB is shared, between connections in a single host, or 656 between hosts if TCB sharing is implemented within a subnet (see 657 Implications section). Some shared TCB parameters are used only to 658 create new TCBs, others are shared among the TCBs of ongoing 659 connections. New connections can join the ongoing set, e.g., to 660 optimize send window size among a set of connections to the same 661 host. 663 Attacks on parameters used only for initialization affect only the 664 transient performance of a TCP connection. For short connections, 665 the performance ramification can approach that of a denial-of- 666 service attack. E.g., if an application changes its TCB to have a 667 false and small window size, subsequent connections would experience 668 performance degradation until their window grew appropriately. 670 The solution is to limit the effect of compromised TCB values. TCBs 671 are compromised when they are modified directly by an application or 672 transmitted between hosts via unauthenticated means (e.g., by using 673 a dirty flag). TCBs that are not compromised by application 674 modification do not have any unique security ramifications. Note 675 that the proposed parameters for TCB sharing are not currently 676 modifiable by an application. 678 All shared TCBs MUST be validated against default minimum parameters 679 before used for new connections. This validation would not impact 680 performance, because it occurs only at TCB initialization. This 681 limits the effect of attacks on new connections to reducing the 682 benefit of TCB sharing, resulting in the current default TCP 683 performance. For ongoing connections, the effect of incoming packets 684 on shared information should be both limited and validated against 685 constraints before use. This is a beneficial precaution for existing 686 TCP implementations as well. 688 TCBs modified by an application SHOULD NOT be shared, unless the new 689 connection sharing the compromised information has been given 690 explicit permission to use such information by the connection API. 691 No mechanism for that indication currently exists, but it could be 692 supported by an augmented API. This sharing restriction SHOULD be 693 implemented in both the host and the subnet. Sharing on a subnet 694 SHOULD utilize authentication to prevent undetected tampering of 695 shared TCB parameters. These restrictions limit the security impact 696 of modified TCBs both for connection initialization and for ongoing 697 connections. 699 Finally, shared values MUST be limited to performance factors only. 700 Other information, such as TCP sequence numbers, when shared, are 701 already known to compromise security. 703 12. IANA Considerations 705 There are no IANA implications or requests in this document. 707 This section should be removed upon final publication as an RFC. 709 13. References 711 13.1. Normative References 713 [RFC793] Postel, Jon, "Transmission Control Protocol," Network 714 Working Group RFC-793/STD-7, ISI, Sept. 1981. 716 [RFC1191] Mogul, J., Deering, S., "Path MTU Discovery," RFC 1191, 717 Nov. 1990. 719 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 720 Requirement Levels", BCP 14, RFC 2119, March 1997. 722 [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU 723 Discovery," RFC 4821, Mar. 2007. 725 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., Jain, A., "TCP Fast 726 Open", RFC 7413, Dec. 2014. 728 [RFC8201] McCann, J., Deering. S., Mogul, J., Hinden, R. (Ed.), 729 "Path MTU Discovery for IP version 6," RFC 8201, Jul. 730 2017. 732 13.2. Informative References 734 [Br02] Brownlee, N. and K. Claffy, "Understanding Internet 735 Traffic Streams: Dragonflies and Tortoises", IEEE 736 Communications Magazine p110-117, 2002. 738 [Be94] Berners-Lee, T., et al., "The World-Wide Web," 739 Communications of the ACM, V37, Aug. 1994, pp. 76-82. 741 [Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for 742 Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994. 744 [Co91] Comer, D., Stevens, D., Internetworking with TCP/IP, V2, 745 Prentice-Hall, NJ, 1991. 747 [FreeBSD] FreeBSD source code, Release 2.10, http://www.freebsd.org/ 749 [Ja86] Jacobson, V., (mail to public list "tcp-ip", no archive 750 found), 1986. 752 [Nd16] Dukkipati, N., Yuchung C., and Amin V., "Research 753 Impacting the Practice of Congestion Control." ACM SIGCOMM 754 CCR (editorial). 756 [DM16] Matz, D., "Optimize TCP's Minimum Retransmission Timeout 757 for Low Latency Environments", Master's thesis, Technical 758 University Munich, 2016. 760 [Ph12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for 761 short TCP flows," 2012 IEEE International Conference on 762 Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213. 764 [RFC1122] Braden, R. (ed), "Requirements for Internet Hosts -- 765 Communication Layers", RFC-1122, Oct. 1989. 767 [RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions 768 Functional Specification," RFC-1644, July 1994. 770 [RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379, 771 September 1992. 773 [RFC2663] Srisuresh, P., Holdrege, M., "IP Network Address 774 Translator (NAT) Terminology and Considerations", RFC- 775 2663, August 1999. 777 [RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's 778 Initial Window," RFC 3390, Oct. 2002. 780 [RFC7231] Fielding, R., J. Reshke, Eds., "HTTP/1.1 Semantics and 781 Content," RFC-7231, June 2014. 783 [RFC3124] Balakrishnan, H., Seshan, S., "The Congestion Manager," 784 RFC 3124, June 2001. 786 [RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion 787 Control Protocol (DCCP)," RFC 4340, Mar. 2006. 789 [RFC4960] Stewart, R., (Ed.), "Stream Control Transmission 790 Protocol," RFC4960, Sept. 2007. 792 [RFC5861] Allman, M., Paxson, V., Blanton, E., "TCP Congestion 793 Control," RFC 5861, Sept. 2009. 795 [RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication 796 Option," RFC 5925, June 2010. 798 [RFC6824] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., "TCP 799 Extensions for Multipath Operation with Multiple 800 Addresses," RFC 6824, Jan. 2013. 802 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing 803 TCP's Initial Window," RFC 6928, Apr. 2013. 805 [RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., Khasnabish, 806 B., "Mechanisms for Optimizing Link Aggregation Group 807 (LAG) and Equal-Cost Multipath (ECMP) Component Link 808 Utilization in Networks", RFC 7424, Jan. 2015 810 [RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer 811 Protocol Version 2 (HTTP/2)", RFC 7540, May 2015. 813 [RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP 814 to Support Rate-Limited Traffic", RFC 7661, Oct. 2015 816 14. Acknowledgments 818 The authors would like to thank for Praveen Balasubramanian for 819 information regarding TCB sharing in Windows, and Yuchung Cheng, 820 Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on 821 earlier versions of the draft. This work has received funding from a 822 collaborative research project between the University of Oslo and 823 Huawei Technologies Co., Ltd., and is partly supported by USC/ISI's 824 Postel Center. 826 This document was prepared using 2-Word-v2.0.template.dot. 828 15. Change log 830 03: 832 - Updated Touch's affiliation and address information 834 02: 836 - Stated that our OS implementation overview table only covers 837 temporal sharing. 839 - Correctly reflected sharing of old_RTT in Linux in the 840 implementation overview table. 842 - Marked entries that are considered safe to share with an 843 asterisk (suggestion was to split the table) 845 - Discussed correct host identification: NATs may make IP 846 addresses the wrong input, could e.g. use HTTP cookie. 848 - Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and 849 MTU 851 - Added information about option sharing, listed options in the 852 appendix 854 Authors' Addresses 856 Joe Touch 858 Manhattan Beach, CA 90266 859 USA 861 Phone: +1 (310) 560-0334 862 Email: touch@strayalpha.com 864 Michael Welzl 865 University of Oslo 866 PO Box 1080 Blindern 867 Oslo N-0316 868 Norway 870 Phone: +47 22 85 24 20 871 Email: michawe@ifi.uio.no 873 Safiqul Islam 874 University of Oslo 875 PO Box 1080 Blindern 876 Oslo N-0316 877 Norway 879 Phone: +47 22 84 08 37 880 Email: safiquli@ifi.uio.no 881 Jianjie You 882 Huawei 883 101 Software Avenue, Yuhua District 884 Nanjing 210012 885 China 887 Email: youjianjie@huawei.com 889 16. Appendix A: TCB sharing history 891 T/TCP proposed using caches to maintain TCB information across 892 instances (temporal sharing), e.g., smoothed RTT, RTT variance, 893 congestion avoidance threshold, and MSS [RFC1644]. These values were 894 in addition to connection counts used by T/TCP to accelerate data 895 delivery prior to the full three-way handshake during an OPEN. The 896 goal was to aggregate TCB components where they reflect one 897 association - that of the host-pair, rather than artificially 898 separating those components by connection. 900 At least one T/TCP implementation saved the MSS and aggregated the 901 RTT parameters across multiple connections, but omitted caching the 902 congestion window information [Br94], as originally specified in 903 [RFC1379]. Some T/TCP implementations immediately updated MSS when 904 the TCP MSS header option was received [Br94], although this was not 905 addressed specifically in the concepts or functional specification 906 [RFC1379][RFC1644]. In later T/TCP implementations, RTT values were 907 updated only after a CLOSE, which does not benefit concurrent 908 sessions. 910 Temporal sharing of cached TCB data was originally implemented in 911 the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same 912 [FreeBSD]. As mentioned before, only the MSS and RTT parameters were 913 cached, as originally specified in [RFC1379]. Later discussion of 914 T/TCP suggested including congestion control parameters in this 915 cache [RFC1644]. 917 17. Appendix B: Options 919 In addition to the options that can be cached and shared, this memo 920 also lists all options for which state should *not* be kept. This 921 list is meant to avoid work duplication and should be removed upon 922 publication. 924 Obsolete (MUST NOT keep state): 926 ECHO 928 ECHO REPLY 930 PO Conn permitted 932 PO service profile 934 CC 936 CC.NEW 938 CC.ECHO 940 Alt CS req 942 Alt CS data 944 No state to keep: 946 EOL 948 NOP 950 WS 952 SACK 954 TS 956 MD5 958 TCP-AO 960 EXP1 962 EXP2 964 MUST NOT keep state: 966 Skeeter (DH exchange - might be obsolete, though) 968 Bubba (DH exchange - might really be obsolete, though) 970 Trailer CS 972 SCPS capabilities 974 S-NACK 976 Records boundaries 978 Corruption experienced 980 SNAP 982 TCP Compression 984 Quickstart response 986 UTO 988 MPTCP (can we cache when this fails?) 990 TFO success 992 MAY keep state: 994 MSS 996 TFO failure (so we don't try again, since it's optional) 998 MUST keep state: 1000 TFP cookie (if TFO succeeded in the past)