idnits 2.17.1 draft-stenberg-httpbis-tcp-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 10 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (December 21, 2015) is 3049 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 432 -- Looks like a reference, but probably isn't: '2' on line 434 ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112) ** Obsolete normative reference: RFC 7540 (Obsoleted by RFC 9113) -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 httpbis D. Stenberg 3 Internet-Draft Mozilla 4 Intended status: Best Current Practice December 21, 2015 5 Expires: June 23, 2016 7 TCP Tuning for HTTP 8 draft-stenberg-httpbis-tcp-01 10 Abstract 12 This document records current best practice for using all versions of 13 HTTP over TCP. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on June 23, 2016. 32 Copyright Notice 34 Copyright (c) 2015 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 50 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 51 2. Socket planning . . . . . . . . . . . . . . . . . . . . . . . 3 52 2.1. Number of open files . . . . . . . . . . . . . . . . . . 3 53 2.2. Number of concurrent network messages . . . . . . . . . . 3 54 2.3. Number of incoming TCP SYNs allowed to backlog . . . . . 4 55 2.4. Use the whole port range for local ports . . . . . . . . 4 56 2.5. Lower the TCP FIN timeout . . . . . . . . . . . . . . . . 4 57 2.6. Reuse sockets in TIME_WAIT state . . . . . . . . . . . . 4 58 2.7. TCP socket buffer sizes and Window Scaling . . . . . . . 4 59 2.8. Set maximum allowed TCP window sizes . . . . . . . . . . 5 60 2.9. Timers and timeouts . . . . . . . . . . . . . . . . . . . 5 61 3. TCP handshake . . . . . . . . . . . . . . . . . . . . . . . . 6 62 3.1. TCP Fast Open . . . . . . . . . . . . . . . . . . . . . . 6 63 3.2. Initial Congestion Window . . . . . . . . . . . . . . . . 6 64 3.3. TCP SYN flood handling . . . . . . . . . . . . . . . . . 6 65 4. TCP transfers . . . . . . . . . . . . . . . . . . . . . . . . 7 66 4.1. Packet Pacing . . . . . . . . . . . . . . . . . . . . . . 7 67 4.2. Explicit Congestion Control . . . . . . . . . . . . . . . 7 68 4.3. Nagle's Algorithm . . . . . . . . . . . . . . . . . . . . 7 69 4.4. Keep-alive . . . . . . . . . . . . . . . . . . . . . . . 7 70 5. Re-using connections . . . . . . . . . . . . . . . . . . . . 7 71 5.1. Slow Start after Idle . . . . . . . . . . . . . . . . . . 7 72 5.2. TCP-Bound Authentications . . . . . . . . . . . . . . . . 8 73 6. Closing connections . . . . . . . . . . . . . . . . . . . . . 8 74 6.1. Half-close . . . . . . . . . . . . . . . . . . . . . . . 8 75 6.2. Abort . . . . . . . . . . . . . . . . . . . . . . . . . . 8 76 6.3. Close Idle Connections . . . . . . . . . . . . . . . . . 8 77 6.4. Tail Loss Probes . . . . . . . . . . . . . . . . . . . . 8 78 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 79 8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 80 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 81 9.1. Normative References . . . . . . . . . . . . . . . . . . 9 82 9.2. Informative References . . . . . . . . . . . . . . . . . 9 83 9.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 10 84 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 10 85 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10 87 1. Introduction 89 HTTP version 1.1 [RFC7230] as well as HTTP version 2 [RFC7540] are 90 defined to use TCP [RFC0793], and their performance can depend 91 greatly upon how TCP is configured. This document records the best 92 current practice for using HTTP over TCP, with a focus on improving 93 end-user perceived performance. 95 These practices are generally applicable to HTTP/1 as well as HTTP/2, 96 although some may note particular impact or nuance regarding a 97 particular protocol version. 99 There are countless scenarios, roles and setups where HTTP is being 100 using so there can be no single specific "Right Answer" to most TCP 101 questions. This document intends only to cover the most important 102 areas of concern and suggest possible actions. 104 1.1. Notational Conventions 106 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 107 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 108 document are to be interpreted as described in [RFC2119]. 110 2. Socket planning 112 Your HTTP server or intermediary may need configuration changes to 113 some system tunables and timeout periods to perform optimally. 114 Actual values will depend on how you are scaling the platform, 115 horizontally or vertically, and other connection semantics. Changing 116 system limits and altering thresholds will change the behavior of 117 your web service and its dependencies. These dependencies are 118 usually common to other services running on the same system, so good 119 planning and testing is advised. 121 This is a list of values to consider and some general advice on how 122 those values can be modified on Linux systems. 124 2.1. Number of open files 126 A modern HTTP server will serve a large number of TCP connections and 127 in most systems each open socket equals an open file. Make sure that 128 limit isn't a bottle neck. In Linux, the limit can be raised like 129 this: 131 fs.file-max = 133 2.2. Number of concurrent network messages 135 Raise the number of packets allowed to get queued when a particular 136 interface receives packets faster than the kernel can process them. 137 In Linux, this limit can be raised like this: 139 net.core.netdev_max_backlog = 141 2.3. Number of incoming TCP SYNs allowed to backlog 143 The number of new connection requests that are allowed to queue up in 144 the kernel. In Linux, this limit can be raised like this: 146 net.core.somaxconn = 148 2.4. Use the whole port range for local ports 150 To make sure the TCP stack can take full advantage of the entire set 151 of possible sockets, give it a larger range of local port numbers to 152 use. 154 net.ipv4.ip_local_port_range = 1024 65535 156 2.5. Lower the TCP FIN timeout 158 High connection completion rates will consume ephemeral ports 159 quickly. Lower the time during which connections are in FIN-WAIT-2/ 160 TIME_WAIT states so that they can be purged faster and thus maintain 161 a maximal number of available sockets. The primitives for the 162 assignment of these values were described in [RFC0793], however 163 significantly lower values are commonly used. 165 net.ipv4.tcp_fin_timeout = 167 2.6. Reuse sockets in TIME_WAIT state 169 When running backend servers on a managed, low latency network you 170 might allow the reuse of sockets in TIME_WAIT state for new 171 connections when a protocol complete termination has occurred. There 172 is no RFC that covers this behaviour. 174 net.ipv4.tcp_tw_reuse = 1 176 2.7. TCP socket buffer sizes and Window Scaling 178 Systems meant to handle and serve a huge number of TCP connections at 179 high speeds need a significant amount of memory for TCP socket 180 buffers. On some systems you can tell the TCP stack what default 181 buffer sizes to use and how much they are allowed to dynamically grow 182 and shrink. Window Scaling is typically linked to socket buffer 183 sizes and on a Linux system can be controlled with the values: 185 net.ipv4.tcp_wmem = 186 net.ipv4.tcp_rmem = 187 The minimum and default tend to require less proactive amendment than 188 the maximum value. When deriving maximum values for use, you should 189 consider the BDP (Bandwidth Delay Product) of the target environment 190 and clients. Consider also that 'read' and 'write' values do not 191 require to be synchronised, as the BDP requirements for a load 192 balancer or middle-box might be very different when acting as a 193 sender or receiver. 195 Allowing needlessly high values beyond the expected limitations of 196 the platform might increase the probability of retransmissions and 197 buffer induced delays within the path. Extensions such as ECN 198 coupled with AQM can help mitigate this undesirable behaviour 199 [RFC7141]. 201 [RFC7323] covers Window Scaling in greater detail. 203 2.8. Set maximum allowed TCP window sizes 205 You may have to increase the largest allowed window size. Window 206 scaling must be accommodated within the maximal values, however it is 207 not uncommon to see the maximum definable higher than the scalable 208 limit; these values can statically defined within socket parameters 209 (SO_RCVBUF,SO_SNDBUF) 211 net.core.rmem_max = 212 net.core.wmem_max = 214 2.9. Timers and timeouts 216 On a modern shared platform it can be common to plan for both long 217 and short lived connections on the same implementation. However, the 218 delivery of static assets and a 'web push' or 'long poll' service 219 provide very different quality of service promises. 221 Fail 'fast': TCP resources can be highly contended. For fault 222 tolerance reasons a server needs to be able to determine within a 223 reasonable time frame whether a connection is still active or 224 required. e.g. If static assets typically return in 100s of 225 milliseconds, and users 'switch off' after <10s keeping timeouts of 226 >30s make little sense and defining a 'quality of service' 227 appropriate to the target platform is encouraged. On a shared 228 platform with mixed session lifetimes, applications that require 229 longer render times have various options to ensure the underlying 230 service and upstream servers in the path can identify the session as 231 not failed: HTTP continuations, Redirects, 202s or sending data. 233 Clients and servers typically have many timeout options, a few 234 notable options are: Connect(client), time to request(server), time 235 to first byte(client), between bytes(server/client), total connection 236 time(server/client). Some implementations merge these values into a 237 single 'timeout' definition even when statistics are reported 238 individually. All should be considered as the defaults in many 239 implementations are highly underiable, even infinite timeouts have 240 been observed. 242 3. TCP handshake 244 3.1. TCP Fast Open 246 TCP Fast Open (a.k.a. TFO, [RFC7413]) allows data to be sent on the 247 TCP handshake, thereby allowing a request to be sent without any 248 delay if a connection is not open. 250 TFO requires both client and server support, and additionally 251 requires application knowledge, because the data sent on the SYN 252 needs to be idempotent. Therefore, TFO can only be used on 253 idempotent, safe HTTP methods (e.g., GET and HEAD), or with 254 intervening negotiation (e.g, using TLS). It should be noted that 255 TFO requires a secret to be defined on the server to mitigate 256 security vulnerabilities it introduces. TFO therefore requires more 257 server side deployment planning than other enhancements. 259 Support for TFO is growing in client platforms, especially mobile, 260 due to the significant performance advantage it gives. 262 3.2. Initial Congestion Window 264 [RFC6928] specifies an initcwnd (initial congestion window) of 10, 265 and is now fairly widely deployed server-side. There has been 266 experimentation with larger initial windows, in combination with 267 packet pacing. Many implementations allow initcwnd to be applied to 268 specific routes which allows a greater degree of flexibility than 269 some other TCP parameters. 271 IW10 has been reported to perform fairly well even in high volume 272 servers. 274 3.3. TCP SYN flood handling 276 TCP SYN Flood mitigations [RFC4987] are necessary and there will be 277 thresholds to tweak. 279 4. TCP transfers 281 4.1. Packet Pacing 283 TBD 285 4.2. Explicit Congestion Control 287 Apple deploying in iOS and OSX [1]. 289 4.3. Nagle's Algorithm 291 Nagle's Algorithm [RFC0896] is the mechanism that makes the TCP stack 292 hold (small) outgoing packets for a short period of time so that it 293 can potentially merge that packet with the next outgoing one. It is 294 optimized for throughput at the expense of latency. 296 HTTP/2 in particular requires that the client can send a packet back 297 fast even during transfers that are perceived as single direction 298 transfers. Even small delays in those sends can cause a significant 299 performance loss. 301 HTTP/1.1 is also affected, especially when sending off a full request 302 in a single write() system call. 304 In POSIX systems you switch it off like this: 306 int one = 1; 307 setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &one, sizeof(one)); 309 4.4. Keep-alive 311 TCP keep-alive is likely disabled - at least on mobile clients for 312 energy saving purposes. App-level keep-alive is then required for 313 long-lived requests to detect failed peers or connections reset by 314 stateful firewalls etc. 316 5. Re-using connections 318 5.1. Slow Start after Idle 320 Slow-start is one of the algorithms that TCP uses to control 321 congestion inside the network. It is also known as the exponential 322 growth phase. Each TCP connection will start off in slow-start but 323 will also go back to slow-start after a certain amount of idle time. 325 In Linux systems you can prevent the TCP stack from going back to 326 slow-start after idle by settting 327 net.ipv4.tcp_slow_start_after_idle = 0 329 5.2. TCP-Bound Authentications 331 There are several HTTP authentication mechanisms in use today that 332 are used or can be used to authenticate a connection rather than a 333 single HTTP request. Two popular ones are NTLM and Negotiate. 335 If such an authentication has been negotiated on a TCP connection, 336 that connection can remain authenticated throughout the rest of its 337 lifetime. This discrepancy with how other HTTP authentications work 338 makes it important to handle these connections with care. 340 6. Closing connections 342 6.1. Half-close 344 The client or server is free to half-close after a request or 345 response has been completed; or when there is no pending stream in 346 HTTP/2. 348 Half-closing is sometimes the only way for a server to make sure it 349 closes down connections cleanly so that it doesn't accept more 350 requests while still allowing clients to receive the ongoing 351 responses. 353 6.2. Abort 355 No client abort for HTTP/1.1 after the request body has been sent. 356 Delayed full close is expected following an error response to avoid 357 RST on the client. 359 6.3. Close Idle Connections 361 Keeping open connections around for subsequent connection reuse is 362 key for many HTTP clients' performance. The value of an existing 363 connection quickly degrades and after only a few minutes the chance 364 that a connection will successfully get reused by a web browser is 365 slim. 367 6.4. Tail Loss Probes 369 draft [2] 371 7. IANA Considerations 373 This document does not require action from IANA. 375 8. Security Considerations 377 TBD 379 9. References 381 9.1. Normative References 383 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 384 RFC 793, DOI 10.17487/RFC0793, September 1981, 385 . 387 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 388 Requirement Levels", BCP 14, RFC 2119, 389 DOI 10.17487/RFC2119, March 1997, 390 . 392 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 393 Protocol (HTTP/1.1): Message Syntax and Routing", 394 RFC 7230, DOI 10.17487/RFC7230, June 2014, 395 . 397 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 398 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 399 DOI 10.17487/RFC7540, May 2015, 400 . 402 9.2. Informative References 404 [RFC0896] Nagle, J., "Congestion Control in IP/TCP Internetworks", 405 RFC 896, DOI 10.17487/RFC0896, January 1984, 406 . 408 [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common 409 Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, 410 . 412 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 413 "Increasing TCP's Initial Window", RFC 6928, 414 DOI 10.17487/RFC6928, April 2013, 415 . 417 [RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion 418 Notification", BCP 41, RFC 7141, DOI 10.17487/RFC7141, 419 February 2014, . 421 [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. 422 Scheffenegger, Ed., "TCP Extensions for High Performance", 423 RFC 7323, DOI 10.17487/RFC7323, September 2014, 424 . 426 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 427 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 428 . 430 9.3. URIs 432 [1] https://developer.apple.com/videos/wwdc/2015/?id=719 434 [2] http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 436 Appendix A. Acknowledgments 438 This specification builds upon previous work and help from Mark 439 Nottingham, Craig Taylor 441 Author's Address 443 Daniel Stenberg 444 Mozilla 446 Email: daniel@haxx.se 447 URI: http://daniel.haxx.se