idnits 2.17.1 draft-irtf-iccrg-sallantin-initial-spreading-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 160 has weird spacing: '...sm uses the p...' == Line 187 has weird spacing: '...: jiffy for L...' == Line 190 has weird spacing: '...alue of the T...' == Line 290 has weird spacing: '... are needed...' -- The document date (January 15, 2014) is 3747 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 157, but not defined == Missing Reference: 'RFC5522' is mentioned on line 228, but not defined == Missing Reference: 'DC10' is mentioned on line 308, but not defined == Missing Reference: 'RFC5681' is mentioned on line 320, but not defined == Unused Reference: 'RFC5532' is defined on line 335, but no explicit reference was found in the text == Unused Reference: 'AH98' is defined on line 343, but no explicit reference was found in the text == Unused Reference: 'DF10' is defined on line 368, but no explicit reference was found in the text == Unused Reference: 'LC09' is defined on line 384, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 5532 ** Downref: Normative reference to an Experimental RFC: RFC 6928 -- Possible downref: Non-RFC (?) normative reference: ref. 'AH98' -- Possible downref: Non-RFC (?) normative reference: ref. 'AS00' -- Possible downref: Non-RFC (?) normative reference: ref. 'DR10' -- Possible downref: Non-RFC (?) normative reference: ref. 'SB13' Summary: 2 errors (**), 0 flaws (~~), 13 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT R.Sallantin 3 Intended Status: Proposed Standard CNES/TAS/TESA 4 Expires: July 19, 2014 C.Baudoin 5 F.Arnal 6 Thales Alenia Space 7 E.Dubois 8 CNES 9 E.Chaput 10 A.Beylot 11 IRIT 12 January 15, 2014 14 Safe increase of the TCP's Initial Window 15 Using Initial Spreading 16 draft-irtf-iccrg-sallantin-initial-spreading-00.txt 18 Abstract 20 This document proposes a new fast start-up mechanism for TCP that can 21 be used to speed the beginning of an Internet connection and then 22 improved the short-lived TCP connections performance. 24 Initial Spreading allows to safely increase the Initial Window size 25 in any cases, and notably in congested networks. 27 Merging the increase in the IW with the spacing of the segments 28 belonging to the Initial Window (IW), Initial Spreading is a very 29 simple mechanism that improves short-lived TCP flows performance and 30 do not deteriorate long-lived TCP flows performance. 32 Status of this Memo 34 This Internet-Draft is submitted to IETF in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF), its areas, and its working groups. Note that 39 other groups may also distribute working documents as 40 Internet-Drafts. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 46 The list of current Internet-Drafts can be accessed at 47 http://www.ietf.org/1id-abstracts.html 49 The list of Internet-Draft Shadow Directories can be accessed at 50 http://www.ietf.org/shadow.html 52 Copyright and License Notice 54 Copyright (c) 2014 IETF Trust and the persons identified as the 55 document authors. All rights reserved. 57 This document is subject to BCP 78 and the IETF Trust's Legal 58 Provisions Relating to IETF Documents 59 (http://trustee.ietf.org/license-info) in effect on the date of 60 publication of this document. Please review these documents 61 carefully, as they describe your rights and restrictions with respect 62 to this document. Code Components extracted from this document must 63 include Simplified BSD License text as described in Section 4.e of 64 the Trust Legal Provisions and are provided without warranty as 65 described in the Simplified BSD License. 67 Table of Contents 69 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 70 2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 3 Initial Spreading mechanism . . . . . . . . . . . . . . . . . . 4 72 4 Implementation considerations . . . . . . . . . . . . . . . . . 4 73 4.1 Short Round Trip Time . . . . . . . . . . . . . . . . . . 5 74 4.2 Delayed Ack . . . . . . . . . . . . . . . . . . . . . . . . 5 75 4.3 TSO/GSO . . . . . . . . . . . . . . . . . . . . . . . . . . 5 76 4.4 RTT measure . . . . . . . . . . . . . . . . . . . . . . . . 6 77 5 Open discussions . . . . . . . . . . . . . . . . . . . . . . . 6 78 5.1 Spacing Interval . . . . . . . . . . . . . . . . . . . . . . 6 79 5.2 Increasing the upper bound TCP's IW to more than 10 80 segments . . . . . . . . . . . . . . . . . . . . . . . . . 7 81 5.3 Initial Spreading and LFN . . . . . . . . . . . . . . . . . 7 82 6 Security Considerations . . . . . . . . . . . . . . . . . . . . 7 83 7 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 84 8 References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 85 8.1 Normative References . . . . . . . . . . . . . . . . . . . 8 86 8.2 Informative References . . . . . . . . . . . . . . . . . . 8 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 89 1 Introduction 91 The long Round Trip Time is probably the most detrimental constraint 92 in Long Fat Networks (LFN), such as satellite networks, and notably 93 for short-lived connections when the long delay significantly 94 downgrades regular slow-start performance [FA11]. Several protocols 95 and even new network architectures have been proposed to deal with 96 this issue. The original idea of Initial Spreading [SB13] was to 97 consider a long RTT as a resource to exploit, rather than as a 98 constant to bypass. The long RTT can therefore be used as an 99 opportunity to safely send a large amount of data during the first 100 RTT after the connection has opened. Spacing the data along the whole 101 RTT would in fact hopefully guarantee high independent probability 102 that each segment is successfully received. 104 This approach resembles a combination of 2 TCP mechanisms: Pacing and 105 Increase in the Initial Window. Both mechanisms have then been 106 studied in depth to design Initial Spreading as an efficient fast 107 start-up TCP mechanism, and notably avoid their respective flaws or 108 weaknesses. 110 The original Pacing idea is to space the segments of a same window 111 along an RTT to prevent generating bursts as far as possible. Hence, 112 each segment arrives separately at the buffer and the impact on its 113 queue is minimized. The bit rate can then reach its maximum. However, 114 [AS00] has pointed out that this lack of bursts is responsible for 115 poor performance. Pacing has a tendency to overload the network, and 116 then cause a synchronization of the flows, that seriously damages 117 both individual and global performance. 119 RFC 6928 [RFC6928] suggests to enlarge the IW size up to ten 120 segments. Several articles and studies demonstrated that this would 121 allow transmission of 90% of the connections in one RTT [DR10]. In 122 most cases, and when the network is not congested in particular, this 123 solution is probably the best one for dealing with short-lived TCP 124 flows. However, in a congested environment, sending a large IW in one 125 burst is likely to impact the buffers and then deteriorate the 126 individual connection. Correlation between the segments of a same 127 burst is responsible for major impairments when regarding the short- 128 lived connections, and in particular for the connections that can be 129 sent in one RTT (number of segments to be transmitted inferior to the 130 upper bound value of the TCP's IW): 132 o a decrease of the probability to successfully transmit the entire 133 window. 135 o an increase of the probability of successive segment losses. 137 o a significant reduction of the number of potential Duplicated 138 Acknowledgements that are necessary to trigger fast loss recovery 139 mechanisms and avoid to wait for an Retransmission Time Out. 141 In favor of a conservative approach, [RFC3390] recommended the use of 142 an IW equal to 3. 144 Both mechanisms therefore suffer from a burst-related phenomenon, but 145 in opposite ways. 147 Initial Spreading has been designed to tackle previous burst issues. 148 Simulations and experimentations show that Initial Spreading is not 149 only efficient in case of LFNs but also for other networks with small 150 RTT. 152 2 Terminology 154 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 155 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 156 document are to be interpreted as described in RFC 2119 [RFC2119]. 158 3 Initial Spreading mechanism 160 Initial Spreading [SB13] mechanism uses the permitted upper bound 161 value of the TCP's IW (e.g; RFC 6928 [RFC6928] suggests to use 10 for 162 this value). Initial Spreading spaces out a number of segments 163 inferior or equal to this value across the first RTT before letting 164 the TCP algorithm continue conventionally: 166 (1) The RTT is measured during the SYN-SYN/ACK exchange. 168 (2) The first RTT is split into n spaces with n the permitted upper 169 bound value of the TCP's IW. Depending on the number of segments 170 to be sent, until n segments are sent every RTT/n. 172 (3) After the transmission of the IW, the regular TCP algorithm is 173 used. 175 Thus, bursts do not downgrade the transmission of short-lived 176 connections, but continue to prevent an overload of the network in 177 the case of long-lived connections. 179 4 Implementation considerations 181 In this section, we discuss a number of aspects surrounding the 182 Initial Spreading implementations. 184 4.1 Short Round Trip Time 186 Whatever the different timer implementations, 2 segments can not be 187 spaced of less than 1 Kernel timer (e.g: jiffy for Linux kernel). 189 In case where the time resulting of the division of 1 RTT by the 190 upper bound value of the TCP's IW is inferior to 1 Kernel timer, 191 Initial Spreading is not activated and TCP uses a regular slow start 192 with a large IW. 194 4.2 Delayed Ack 196 The use of Delayed Ack (Del Ack) does not downgrade Initial Spreading 197 efficiency. 199 Regarding long-lived connections and notably TCP's steady state, the 200 effects of Del Ack are lessened by new TCP's flavors (such as TCP 201 Cubic or Compound TCP [HR08][TS06]) which tend to adapt their 202 congestion algorithm to take into account whether the receiver uses 203 the Del Ack option or not. In doing so, they can prevent the 204 connection from being too slow, and still continue to reduce 205 acknowledgments traffic. In the event of short-lived connections, the 206 use of Del Ack does not modify the transmission of the IW. There is 207 then no change in the burst propagation. 209 4.3 TSO/GSO 211 TSO/GSO is used to reduce the CPU overhead of TCP/IP on fast 212 networks. Instead of doing the segmentation in the kernel, large 213 packets are sent to the Network Interface Card (NIC). The 214 segmentation is then achieved by the NIC or just before the entry 215 into the driver's xmit routine. 217 In its current design, Initial Spreading is not working when TSO or 218 GSO are activated, but using Initial Spreading with an inactive 219 TSO/GSO still enables better performance. 221 Two options can be foreseen for the joint use of Initial Spreading 222 and TSO/GSO: 224 (1) disable TSO/GSO for the first RTT, with no impact on performance 225 since the throughput is limited by the IW. 227 (2) implement Initial Spreading using the TCP Offload Engine (TOE) 228 [RFC5522]. 230 4.4 RTT measure 232 Initial spreading uses the SYN-SYN/ACK exchange to calculate the 233 space between two segments. This measurement may not be perfectly 234 accurate in congested networks when the RTT varies. Two different 235 scenarios can then occur: 237 o The measured RTT is superior to the RTT of the first segment of 238 the IW, and an ACK arrives before that all the segments of the IW 239 have been sent. Then, Initial Spreading MUST be stopped and only 240 the segments transmission that is triggered by the received ACK is 241 done. 243 o The measured RTT is inferior to the RTT of the first segment of 244 the IW. Consequences are negligible. 246 5 Open discussions 248 In this section, we introduce possible improvements for Initial 249 Spreading and new perspectives. 251 5.1 Spacing Interval 253 It has been observed that most of the savings enabled by the Initial 254 Spreading in congested environments comes from the independence of 255 the segments sent during the first RTT. Indeed, experimentations have 256 shown that preventing the bursts, Initial Spreading enables each 257 segment of the IW to have an independent loss probability. 259 Currently, Initial Spreading waits RTT/n seconds before transmitting 260 two segments of the IW, with n the permitted upper bound value of the 261 TCP's IW. 263 This simple mechanism offers very good results but has two minor 264 drawbacks: 266 (1) An inaccuracy of the space measure (cf section 4.4). 268 (2) In uncongested networks, Initial Spreading adds an extra delay 269 that is equal to (IW-1) * RTT/n, with IW the number of segments 270 (<= n) that can be sent during the first RTT. 272 A solution could be to set the space not as a ratio of the measured 273 RTT but as a minimal space that preserves the independence between 274 the sent segments. Preliminary results show that smaller spacing 275 interval may allow to maintain the independence of the segment loss 276 probability. This may provide the same performances in congested 277 networks and improve the average delay in uncongested networks. 279 5.2 Increasing the upper bound TCP's IW to more than 10 segments 281 [DR10] have shown that an IW of 10 segments enables to send more than 282 90% of the web objects in one RTT. So the authors recommend to use 283 Initial Spreading as a complement to [RFC6928]. 285 If the average size of the web objects continues to evolve, Initial 286 Spreading can be used to raise the IW size. Simulations and 287 experiments showed even better results with an IW equal to 12. 289 Thus, Initial Spreading paves the way for larger IW. Further studies 290 are needed to assess the impact on the networks, notably in terms of 291 individual performance, fairness, friendliness and global 292 performance. 294 5.3 Initial Spreading and LFN 296 The space community designed middleboxes to mitigate poor TCP 297 performance for network with large RTT [FA11]. Proxy Enhancement 298 Performance (PEP) are generally used in LFN and in particular in 299 satellite communication systems [RFC3135] and offer very good TCP 300 performance. 302 Nevertheless, some recent studies have emphasized major impairments 303 occasioned by the use of satellite-specific transport solutions, and 304 notably TCP-PEPs, in a global context. The break of the end-to-end 305 TCP semantic, which is required to isolate the satellite segment, is 306 notably responsible for an increased complexity in case of mobility 307 scenarios or security context. This strongly mitigates PEPs benefits 308 and reopens the debate on their relevance[DC10]. 310 Many researchers have outlined that new TCP releases perform well for 311 long-lived TCP connections, even in satellite environment [SC12], but 312 continue to suffer from very poor performance in case of short-lived 313 TCP connections. 315 Initial Spreading enables to reduce the RTT consequences for short- 316 lived TCP connections and could be an end-to-end alternative to PEP. 318 6 Security Considerations 320 The security considerations found in [RFC5681] apply to this 321 document. No additional security problems have been identified with 322 Initial Spreading at this time. 324 7 IANA Considerations 326 This document contains no IANA considerations. 328 8 References 330 8.1 Normative References 332 [RFC3390] A. Allman and S. Floyd, "Increasing tcp's initial window," 333 RFC 3390, IETF, Proposed Standard, 2002. 335 [RFC5532] T. Talpey, C. Juszczak, "Network File System (NFS) Remote 336 Direct Memory Access (RDMA) Problem Statement," RFC 5532, 337 IETF, Informational, May 2009. 339 [RFC6928] J. Chu, N. Dukkipati, Y. Cheng, and M. Mathis, "Increasing 340 tcp's initial window," RFC 6928, IETF, Experimental, Jan. 341 2013. 343 [AH98] A. Allman, C. Hayes, and S. Ostermann, "An evaluation of TCP 344 with Larger Initial Windows," ACM Computer Communication 345 Review, 1998. 347 [AS00] A. Aggarwal, S. Savage, and T. Anderson, "Understanding the 348 performance of TCP pacing," in INFOCOM, vol. 3, mar 2000, 349 pp. 1157-1165. 351 [DR10] N. Dukkipati, T. Refice, Y. Cheng, J. Chu, T. Herbert, A. 352 Agarwal, A. Jain, and N. Sutin, "An Argument for 353 Increasing TCP's Initial Congestion Window," SIGCOMM 354 Comput. Commun. Rev., vol. 40, no. 3, pp. 26-33, Jun. 355 2010. 357 [SB13] R. Sallantin, C. Baudoin, E. Chaput, E. Dubois, F. Arnal, and 358 A. Beylot, "Initial spreading: a fast start-up tcp 359 mechanism," proceedings of LCN, 2013. 361 8.2 Informative References 363 [RFC3135] J. Border, M. Kojo, J. Griner, G. Montenegro, Z. Shelby, 364 "Performance Enhancing Proxies Intended to Mitigate Link- 365 Related Degradations," RFC 3135, IETF, Informational, June 366 2001. 368 [DF10] E. Dubois, J. Fasson, C. Donny, and E. Chaput, "Enhancing tcp 369 based communications in mobile satellite scenarios: Tcp 370 peps issues and solutions," in Proc. 5th Advanced 371 satellite multimedia systems conference (asma) and the 372 11th signal processing for space communications workshop 373 (spsc), pages 476-483, 2010. 375 [FA11] A. Fairhurst, G. Arjuna, H. Cruickshank, and C. Baudoin, 376 "Transport challenges facing a next generation hybrid 377 satellite internet," in International Journal of Satellite 378 Communications and networking, 2011. 380 [HR08] S. Ha, I. Rhee, and L. Xu, "CUBIC: A New TCP-Friendly High- 381 Speed TCP Variant," SIGOPS Oper. Syst. Rev., vol. 42, no. 382 5, pp. 64-74, Jul. 2008. 384 [LC09] R. Lacamera, D. Caini, C. Firrincieli, "Comparative 385 performance evaluation of tcp variants on satellite 386 environments," in ICC'09 Proceedings of the 2009 IEEE 387 international conference on Communications, pages Pages 388 5161-5165, 2009. 390 [SC12] R. Sallantin, E. Chaput, E. P. Dubois, C. Baudoin, F. Arnal, 391 and A.-L.Beylot, "On the sustainability of PEPs for 392 satellite Internet access," in ICSSC. AIAA, 2012. 394 [TS06] K. Tan, J. Song, Q. Zhang, and M. Sridharan, "Compound TCP: A 395 Scalable and TCP-friendly Congestion Control for High- 396 speed Networks," in 4th International workshop on 397 Protocols for Fast Long-Distance Networks (PFLDNet), 2006. 399 Authors' Addresses 401 Comments are solicited and should be addressed to the working group's 402 mailing list at iccrg@irtf.org and/or the authors: 404 Renaud Sallantin 405 CNES/TAS/TESA 406 IRIT/ENSEEIHT 2, rue Charles Camichel BP 7122 407 31071 Toulouse Cedex 7 408 France 409 Phone: +33 6 48 07 86 44 410 Email: renaud.sallantin@gmail.com 412 Cedric Baudoin 413 Thales Alenia Space (TAS) 414 26 Avenue Jean Francois Champollion, 415 31100 Toulouse 416 France 417 Email: cedric.baudoin@thalesaleniaspace.com 419 Fabrice Arnal 420 Thales Alenia Space 421 Email: fabrice.arnal@thalesaleniaspace.com 423 Emmanuel Dubois 424 Centre National des Etudes Spatiales (CNES) 425 18 Avenue Edouard Belin 426 31400 Toulouse 427 France 428 Email: emmanuel.Dubois@cnes.Fr 430 Emmanuel Chaput 431 IRIT 432 IRIT / ENSEEIHT 2, rue Charles Camichel BP 7122 433 31071 Toulouse Cedex 7 434 France 435 Email: emmanuel.chaput@enseeiht.fr 437 Andre-Luc Beylot 438 IRIT 439 Email: andre-Luc.Beylot@enseeiht.fr