idnits 2.17.1 draft-touch-tcp-interdep-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-27) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 602 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2], [3], [4], [5], [6], [7], [8], [9], [10], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 169 has weird spacing: '... MSSopt curr-...' == Line 173 has weird spacing: '...-RTTvar curr...' == Line 230 has weird spacing: '... MSSopt curr-...' == Line 232 has weird spacing: '... update rtt_u...' == Line 234 has weird spacing: '...-RTTvar curr...' == (1 more instance...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 11, 1996) is 10182 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 1379 (ref. '2') (Obsoleted by RFC 6247) ** Obsolete normative reference: RFC 1644 (ref. '3') (Obsoleted by RFC 6247) -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' ** Obsolete normative reference: RFC 793 (ref. '9') (Obsoleted by RFC 9293) -- Possible downref: Non-RFC (?) normative reference: ref. '10' Summary: 12 errors (**), 0 flaws (~~), 8 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Joe Touch 2 draft-touch-tcp-interdep-00.txt ISI 3 June 11, 1996 4 Expires: Dec. 11, 1996 6 TCP Control Block Interdependence 8 Status of this Memo 10 This document is an Internet-Draft. Internet-Drafts are working 11 documents of the Internet Engineering Task Force (IETF), its areas, 12 and its working groups. Note that other groups may also distribute 13 working documents as Internet-Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six months 16 and may be updated, replaced, or obsoleted by other documents at any 17 time. It is inappropriate to use Internet- Drafts as reference 18 material or to cite them other than as ``work in progress.'' 20 Please check the I-D abstract listing contained in each Internet 21 Draft directory to learn the current status of this or any other 22 Internet Draft. 24 The distribution of this document is unlimited. 26 Abstract 28 This draft makes the case for interdependent TCP control blocks, 29 where part of the TCP state is shared among similar concurrent 30 connections, or across similar connection instances. TCP state 31 includes a combination of parameters, such as connection state, 32 current round-trip time estimates, congestion control information, 33 and process information. This state is currently maintained on a 34 per-connection basis in the TCP control block, but should be shared 35 across connections using to the same host. The goal is to improve 36 transient transport performance, while maintaining backward- 37 compatibility with existing implementations. 39 This document is a product of the LSAM project at ISI. Comments are 40 solicited and should be addressed to the author. 42 Introduction 44 TCP is a connection-oriented reliable transport protocol layered over 45 IP [9]. Each TCP connection maintains state, usually in a data 46 structure called the TCP Control Block (TCB). The TCB contains 47 information about the connection state, its associated local process, 48 and feedback parameters about the connection's transmission 49 properties. As originally specified and usually implemented, the TCB 50 is maintained on a per-connection basis. This document discusses the 51 implications of that decision, and argues for an alternate 52 implementation that shares some of this state across similar 53 connection instances and among similar simultaneous connections. The 54 resulting implementation can have better transient performance, 55 especially for numerous short-lived and simultaneous connections, as 56 often used in the World-Wide Web [1]. These changes affect only the 57 TCB initialization, and so have no effect on the long-term behavior 58 of TCP after a connection has been established. 60 The TCP Control Block (TCB) 62 A TCB is associated with each connection, i.e., with each association 63 of a pair of applications across the network. The TCB can be 64 summarized as containing [9]: 66 Local process state 68 pointers to send and receive buffers 69 pointers to retransmission queue and current segment 70 pointers to Internet Protocol (IP) PCB 72 Per-connection shared state 74 macro-state 76 connection state 77 timers 78 flags 79 local and remote host numbers and ports 81 micro-state 83 send and receive window state (size*, current number) 84 round-trip time and variance 85 cong. window size* 86 cong. window size threshold* 87 max windows seen* 88 MSS# 89 round-trip time and variance# 91 The per-connection information is shown as split into macro-state and 92 micro-state, terminology borrowed from [5]. Macro-state describes the 93 finite state machine; we include the endpoint numbers and components 94 (timers, flags) used to help maintain that state. This includes the 95 protocol for establishing and maintaining shared state about the 96 connection. Micro-state describes the protocol after a connection has 97 been established, to maintain the reliability and congestion control 98 of the data transferred in the connection. 100 We further distinguish two other classes of shared micro-state that 101 are associated more with host-pairs than with application pairs. One 102 class is clearly host-pair dependent (#, e.g., MSS, RTT), and the 103 other is host-pair dependent in its aggregate (*, e.g., cong. window 104 info., curr. window sizes). 106 TCB Interdependence 108 The observation that some TCB state is host-pair specific rather than 109 application-pair dependent is not new, and is a common engineering 110 decision in layered protocol implementations. A discussion of sharing 111 RTT information among protocols layered over IP, including UDP and 112 TCP, occurred in [8]. T/TCP uses caches to maintain TCB information 113 across instances, e.g., smoothed RTT, RTT variance, congestion 114 avoidance threshold, and MSS [3]. These values are in addition to 115 connection counts used by T/TCP to accelerate data delivery prior to 116 the full three-way handshake during an OPEN. The goal is to aggregate 117 TCB components where they reflect one association - that of the 118 host-pair, rather than artificially separating those components by 119 connection. 121 At least one current T/TCP implementation saves the MSS and 122 aggregates the RTT parameters across multiple connections, but omits 123 caching the congestion window information [4], as originally 124 specified in [2]. There may be other values that may be cached, such 125 as current window size, to permit new connections full access to 126 accumulated channel resources. 128 We observe that there are two cases of TCB interdependence. Temporal 129 sharing occurs when the TCB of an earlier (now CLOSED) connection to 130 a host is used to initialize some parameters of a new connection to 131 that same host. Ensemble sharing occurs when a currently active 132 connection to a host is used to initialize another (concurrent) 133 connection to that host. T/TCP documents considered the temporal 134 case; we consider both. 136 An Example of Temporal Sharing 138 Temporal sharing of cached TCB data has been implemented in the SunOS 139 4.1.3 T/TCP extensions [4] and the FreeBSD port of same [7]. As 140 mentioned before, only the MSS and RTT parameters are cached, as 141 originally specified in [2]. Later discussion of T/TCP suggested 142 including congestion control parameters in this cache [3]. 144 The cache is accessed in two ways: it is read to initialize new TCBs, 145 and written when more current per-host state is available. New TCBs 146 are initialized as follows; snd_cwnd reuse is not yet implemented, 147 although discussed in the T/TCP concepts [2]: 149 TEMPORAL SHARING - TCB Initialization 151 Cached TCB New TCB 152 ---------------------------------------- 153 old-MSS old-MSS 155 old-RTT old-RTT 157 old-RTTvar old-RTTvar 159 old-snd_cwnd old-snd_cwnd (not yet impl.) 161 Most cached TCB values are updated when a connection closes. An 162 exception is MSS, which is updated whenever the MSS option is 163 received in a TCP header. 165 TEMPORAL SHARING - Cache Updates 167 Cached TCB Current TCB when? New Cached TCB 168 --------------------------------------------------------------- 169 old-MSS curr-MSS MSSopt curr-MSS 171 old-RTT curr-RTT CLOSE old += (curr - old) >> 2 173 old-RTTvar curr-RTTvar CLOSE old += (curr - old) >> 2 175 old-snd_cwnd curr-snd_cwnd CLOSE curr-snd_cwnd (not yet impl.) 177 MSS caching is trivial; reported values are cached, and the most 178 recent value is used. The cache is updated when the MSS option is 179 received, so the cache always has the most recent MSS value from any 180 connection. The cache is consulted only at connection establishment, 181 and not otherwise updated, which means that MSS options do not affect 182 current connections. The default MSS is never saved; only reported 183 MSS values update the cache, so an explicit override is required to 184 reduce the MSS. 186 RTT values are updated by a more complicated mechanism [3], [8]. 187 Dynamic RTT estimation requires a sequence of RTT measurements, even 188 though a single T/TCP transaction may not accumulate enough samples. 189 As a result, the cached RTT (and its variance) is an average of its 190 previous value with the contents of the currently active TCB for that 191 host, when a TCB is closed. RTT values are updated only when a 192 connection is closed. Further, the method for averaging the RTT 193 values is not the same as the method for computing the RTT values 194 within a connection, so that the cached value may not be appropriate. 196 For temporal sharing, the cache requires updating only when a 197 connection closes, because the cached values will not yet be used to 198 initialize a new TCB. For the ensemble sharing, this is not the case, 199 as discussed below. 201 Other TCB variables may also be cached between sequential instances, 202 such as the congestion control window information. Old cache values 203 can be overwritten with the current TCB estimates, or a MAX or MIN 204 function can be used to merge the results, depending on the optimism 205 or pessimism of the reused values. For example, the congestion window 206 can be reused if there are no concurrent connections. 208 An Example of Ensemble Sharing 210 Sharing cached TCB data across concurrent connections requires 211 attention to the aggregate nature of some of the shared state. 212 Although MSS and RTT values can be shared by copying, it may not be 213 appropriate to copy congestion window information. At this point, we 214 present only the MSS and RTT rules: 216 ENSEMBLE SHARING - TCB Initialization 218 Cached TCB New TCB 219 ---------------------------------- 220 old-MSS old-MSS 222 old-RTT old-MSS 224 old-RTTvar old-RTTvar 226 ENSEMBLE SHARING - Cache Updates 228 Cached TCB Current TCB when? New Cached TCB 229 ----------------------------------------------------------- 230 old-MSS curr-MSS MSSopt curr-MSS 232 old-RTT curr-RTT update rtt_update(old,curr) 234 old-RTTvar curr-RTTvar update rtt_update(old,curr) 236 For ensemble sharing, TCB information should be cached as early as 237 possible, sometimes before a connection is closed. Otherwise, opening 238 multiple concurrent connections may not result in TCB data sharing if 239 no connection closes before others open. An optimistic solution would 240 be to update cached data as early as possible, rather than only when 241 a connection is closing. Some T/TCP implementations do this for MSS 242 when the TCP MSS header option is received [4], although it is not 243 addressed specifically in the concepts or functional specification 244 [2][3]. 246 In current T/TCP, RTT values are updated only after a CLOSE, which 247 does not benefit concurrent sessions. As mentioned in the temporal 248 case, averaging values between concurrent connections requires 249 incorporating new RTT measurements. The amount of work involved in 250 updating the aggregate average should be minimized, but the resulting 251 value should be equivalent to having all values measured within a 252 single connection. The function "rtt_update" in the ensemble sharing 253 table indicates this operation, which occurs whenever the RTT would 254 have been updated in the individual TCP connection. As a result, the 255 cache contains the shared RTT variables, which no longer need to 256 reside in the TCB [8]. 258 Congestion window size aggregation is more complicated in the 259 concurrent case. When there is an ensemble of connections, we need 260 to decide how that ensemble would have shared the congestion window, 261 in order to derive initial values for new TCBs. Because concurrent 262 connections between two hosts share network paths (usually), they 263 also share whatever capacity exists along that path. With regard to 264 congestion, the set of connections might behave as if it were 265 multiplexed prior to TCP, as if all data were part of a single 266 connection. As a result, the current window sizes would maintain a 267 constant sum, presuming sufficient offered load. This would go beyond 268 caching to truly sharing state, as in the RTT case. 270 We pause to note that any assumption of this sharing can be 271 incorrect, including this one. In current implementations, new 272 congestion windows are set at an initial value of one segment, so 273 that the sum of the current windows is increased for any new 274 connection. This can have detrimental consequences where several 275 connections share a highly congested link, such as in trans-Atlantic 276 Web access. 278 There are several ways to initialize the congestion window in a new 279 TCB among an ensemble of current connections to a host, as shown 280 below. Current TCP implementations initialize it to one segment [9], 281 and T/TCP hinted that it should be initialized to the old window size 282 [3]. In the former, the assumption is that new connections should 283 behave as conservatively as possible. In the latter, no accommodation 284 is made to concurrent aggregate behavior. 286 In either case, the sum of window sizes can increase, rather than 287 remain constant. Another solution is to give each pending connection 288 its "fair share" of the available congestion window, and let the 289 connections balance from there. The assumption we make here is that 290 new connections are implicit requests for an equal share of available 291 link bandwidth which should be granted at the expense of current 292 connections. This may or may not be the appropriate function; we 293 propose that it be examined further. 295 ENSEMBLE SHARING - TCB Initialization 296 Some Options for Sharing Window-size 298 Cached TCB New TCB 299 ----------------------------------------------------------------- 300 old-snd_cwnd (current) one segment 302 (T/TCP hint) old-snd_cwnd 304 (proposed) old-snd_cwnd/(N+1) 305 subtract old-snd_cwnd/(N+1)/N 306 from each concurrent 308 ENSEMBLE SHARING - Cache Updates 310 Cached TCB Current TCB when? New Cached TCB 311 ---------------------------------------------------------------- 312 old-snd_cwnd curr-snd_cwnd update (adjust sum as appropriate) 314 Compatibility Issues 316 Current TCP implementations do not use TCB caching, with the 317 exception of T/TCP variants [4][7]. New connections use the default 318 initial values of all non-instantiated TCB variables. As a result, 319 each connection calculates its own RTT measurements, MSS value, and 320 congestion information. Eventually these values are updated for each 321 connection. 323 For the congestion and current window information, the initial values 324 may not be consistent with the long-term aggregate behavior of a set 325 of concurrent connections. If a single connection has a window of 4 326 segments, new connections assume initial windows of 1 segment (the 327 minimum), although the current connection's window doesn't decrease 328 to accommodate this additional load. As a result, connections can 329 mutually interfere. One example of this has been seen on trans- 330 Atlantic links, where concurrent connections supporting Web traffic 331 can collide because their initial windows are too large, even when 332 set at one segment. 334 Because this proposal attempts to anticipate the aggregate steady- 335 state values of TCB state among a group or over time, it should avoid 336 the transient effects of new connections. In addition, because it 337 considers the ensemble and temporal properties of those aggregates, 338 it should also prevent the transients of short-lived or multiple 339 concurrent connections from adversely affecting the overall network 340 performance. We are performing analysis and experiments to validate 341 these assumptions. 343 Performance Considerations 345 Here we attempt to optimize transient behavior of TCP without 346 modifying its long-term properties. The predominant expense is in 347 maintaining the cached values, or in using per-host state rather than 348 per-connection state. In cases where performance is affected, 349 however, we note that the per-host information can be kept in per- 350 connection copies (as done now), because with higher performance 351 should come less interference between concurrent connections. 353 Sharing TCB state can occur only at connection establishment and 354 close (to update the cache), to minimize overhead, optimize transient 355 behavior, and minimize the effect on the steady-state. It is possible 356 that sharing state during a connection, as in the RTT or window-size 357 variables, may be of benefit, provided its implementation cost is not 358 high. 360 Implications 362 There are several implications to incorporating TCB interdependence 363 in TCP implementations. First, it may prevent the need for 364 application-layer multiplexing for performance enhancement [6]. 365 Protocols like persistent-HTTP avoid connection reestablishment costs 366 by serializing or multiplexing a set of per-host connections across a 367 single TCP connection. This avoids TCP's per-connection OPEN 368 handshake, and also avoids recomputing MSS, RTT, and congestion 369 windows. By avoiding the so-called, "slow-start restart," performance 370 can be optimized. Our proposal provides the MSS, RTT, and OPEN 371 handshake avoidance of T/TCP, and the "slow-start restart avoidance" 372 of multiplexing, without requiring a multiplexing mechanism at the 373 application layer. This multiplexing will be complicated when 374 quality-of-service mechanisms (e.g., "integrated services 375 scheduling") are provided later. 377 Second, we are attempting to push some of the TCP implementation from 378 the traditional transport layer (in the ISO model [10]), to the 379 network layer. This acknowledges that some state currently maintained 380 as per-connection is in fact per-path, which we simplify as per- 381 host-pair. Transport protocols typically manage per-application-pair 382 associations (per stream), and network protocols manage per-path 383 associations (routing). Round-trip time, MSS, and congestion 384 information is more appropriately handled in a network-layer fashion, 385 aggregated among concurrent connections, and shared across connection 386 instances. 388 An earlier version of RTT sharing suggested implementing RTT state at 389 the IP layer, rather than at the TCP layer [8]. Our observations are 390 for sharing state among TCP connections, which avoids some of the 391 difficulties in an IP-layer solution. One such problem is determining 392 the associated prior outgoing packet for an incoming packet, to infer 393 RTT from the exchange. Because RTTs are still determined inside the 394 TCP layer, this is simpler than at the IP layer. This is a case where 395 information should be computed at the transport layer, but shared at 396 the network layer. 398 We also note that per-host-pair associations are not the limit of 399 these techniques. It is possible that TCBs could be similarly shared 400 between hosts on a LAN, because the predominant path can be LAN-LAN, 401 rather than host-host. 403 There may be other information that can be shared between concurrent 404 connections. For example, knowing that another connection has just 405 tried to expand its window size and failed, a connection may not 406 attempt to do the same for some period. The idea is that existing TCP 407 implementations infer the behavior of all competing connections, 408 including those within the same host or LAN. One possible 409 optimization is to make that implicit feedback explicit, via extended 410 information in the per-host TCP area. 412 Security Considerations 414 Security considerations are not addressed here. 416 Acknowledgements 418 The author would like to thank the members of the High-Performance 419 Computing and Communications Division at ISI, notably Bill Manning, 420 Bob Braden, Jon Postel, and Ted Faber for their assistance in the 421 development of this draft. 423 References 425 [1] Berners-Lee, T., et al., "The World-Wide Web," Communications of 426 the ACM, V37, Aug. 1994, pp. 76-82. 428 [2] Braden, R., "Transaction TCP -- Concepts," RFC-1379, 429 USC/Information Sciences Institute, September 1992. 431 [3] Braden, R., "T/TCP -- TCP Extensions for Transactions Functional 432 Specification," RFC-1644, USC/Information Sciences Institute, 433 July 1994. 435 [4] Braden, B., "T/TCP -- Transaction TCP: Source Changes for Sun OS 436 4.1.3,", Release 1.0, USC/ISI, September 14, 1994. 438 [5] Comer, D., and Stevens, D., Internetworking with TCP/IP, V2, 439 Prentice-Hall, NJ, 1991. 441 [6] Fielding, R., et al., "Hypertext Transfer Protocol -- HTTP/1.1," 442 (work in progress), 06/04/1996, . 446 [8] Jacobson, V., (mail to public list "tcp-ip", no archive found), 447 1986. 449 [9] Postel, Jon, "Transmission Control Protocol," Network Working 450 Group RFC-793/STD-7, ISI, Sept. 1981. 452 [10] Tannenbaum, A., Computer Networks, Prentice-Hall, NJ, 1988. 454 Author's Address 456 Joe Touch 457 University of Southern California/Information Sciences Institute 458 4676 Admiralty Way 459 Marina del Rey, CA 90292-6695 460 USA 461 Phone: +1 310-822-1511 x151 462 Fax: +1 310-823-6714 463 URL: http://www.isi.edu/~touch 464 Email: touch@isi.edu