idnits 2.17.1 draft-stenberg-httpbis-tcp-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 10 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (November 6, 2015) is 3087 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 374 -- Looks like a reference, but probably isn't: '2' on line 376 ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 7230 (Obsoleted by RFC 9110, RFC 9112) ** Obsolete normative reference: RFC 7540 (Obsoleted by RFC 9113) -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 httpbis D. Stenberg 3 Internet-Draft Mozilla 4 Intended status: Best Current Practice November 6, 2015 5 Expires: May 9, 2016 7 TCP Tuning for HTTP 8 draft-stenberg-httpbis-tcp-00 10 Abstract 12 This document records current best practice for using all versions of 13 HTTP over TCP. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on May 9, 2016. 32 Copyright Notice 34 Copyright (c) 2015 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 50 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 51 2. Socket planning . . . . . . . . . . . . . . . . . . . . . . . 3 52 2.1. Number of open files . . . . . . . . . . . . . . . . . . 3 53 2.2. Number of concurrent network messages . . . . . . . . . . 3 54 2.3. Number of incoming TCP SYNs allowed to backlog . . . . . 4 55 2.4. Use the whole port range for local ports . . . . . . . . 4 56 2.5. Lower the TCP FIN timeout . . . . . . . . . . . . . . . . 4 57 2.6. Re-use sockets in TIME_WAIT state . . . . . . . . . . . . 4 58 2.7. Give the the TCP stack enough memory . . . . . . . . . . 4 59 2.8. Set maximum allowed TCP window sizes . . . . . . . . . . 4 60 2.9. Timers and time-outs . . . . . . . . . . . . . . . . . . 5 61 3. TCP handshake . . . . . . . . . . . . . . . . . . . . . . . . 5 62 3.1. TCP Fast Open . . . . . . . . . . . . . . . . . . . . . . 5 63 3.2. Initial Congestion Window . . . . . . . . . . . . . . . . 5 64 3.3. TCP SYN flood handling . . . . . . . . . . . . . . . . . 5 65 4. TCP transfers . . . . . . . . . . . . . . . . . . . . . . . . 5 66 4.1. Packet Pacing . . . . . . . . . . . . . . . . . . . . . . 6 67 4.2. Explicit Congestion Control . . . . . . . . . . . . . . . 6 68 4.3. Nagle's Algorithm . . . . . . . . . . . . . . . . . . . . 6 69 4.4. Keep-alive . . . . . . . . . . . . . . . . . . . . . . . 6 70 5. Re-using connections . . . . . . . . . . . . . . . . . . . . 6 71 5.1. Slow Start after Idle . . . . . . . . . . . . . . . . . . 6 72 5.2. TCP-Bound Authentications . . . . . . . . . . . . . . . . 7 73 6. Closing connections . . . . . . . . . . . . . . . . . . . . . 7 74 6.1. Half-close . . . . . . . . . . . . . . . . . . . . . . . 7 75 6.2. Abort . . . . . . . . . . . . . . . . . . . . . . . . . . 7 76 6.3. Close Idle Connections . . . . . . . . . . . . . . . . . 7 77 6.4. Tail Loss Probes . . . . . . . . . . . . . . . . . . . . 7 78 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 79 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 80 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 81 9.1. Normative References . . . . . . . . . . . . . . . . . . 8 82 9.2. Informative References . . . . . . . . . . . . . . . . . 8 83 9.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 9 84 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 9 85 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 9 87 1. Introduction 89 HTTP version 1.1 [RFC7230] as well as HTTP version 2 [RFC7540] are 90 defined to use TCP [RFC0793], and their performance can depend 91 greatly upon how TCP is configured. This document records best 92 current practice for using HTTP over TCP, with a focus on improving 93 end-user perceived performance. 95 These practices are generally applicable to HTTP/1 as well as HTTP/2, 96 although some may note particular impact or nuance regarding a 97 particular protocol version. 99 There are countless scenarios, roles and setups where HTTP is being 100 using so there can be no single specific "Right Answer" to most TCP 101 questions. This document intends only to cover the most important 102 areas of concern and suggest possible actions. 104 1.1. Notational Conventions 106 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 107 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 108 document are to be interpreted as described in [RFC2119]. 110 2. Socket planning 112 Your HTTP server or intermediary may need configuration changes to 113 some system tunables and timeout periods to perform optimally. 114 Actual values will depend on how you are scaling the platform, 115 horizontally or vertically, and other connection semantics. Changing 116 system limits and altering thresholds will change the behavior of 117 your web service and it's dependencies, these dependencies are 118 usually common to other services running on the same system so good 119 planning and testing is advised. 121 This is a list of values to consider and some general advice on how 122 they can be modified on Linux systems. 124 2.1. Number of open files 126 A modern HTTP server will serve a large number of TCP connections and 127 in most systems each open socket equals on open file. Make sure that 128 limit isn't a bottle neck. In Linux, the limit can be raised like 129 this: 131 fs.file-max = 133 2.2. Number of concurrent network messages 135 Raise the number of packets allowed to get queued when a particular 136 interface receives packets faster than the kernel can process them. 137 In Linux, this limit can be raised like this: 139 net.core.netdev_max_backlog = 141 2.3. Number of incoming TCP SYNs allowed to backlog 143 The number of new connection requests that are allowed to queue up in 144 the kernel. In Linux, this limit can be raised like this: 146 net.core.somaxconn = 148 2.4. Use the whole port range for local ports 150 To make sure the TCP stack can take full advantage of the entire set 151 of possible sockets, give it a larger range of local port numbers to 152 use. 154 net.ipv4.ip_local_port_range = 1024 65535 156 2.5. Lower the TCP FIN timeout 158 Lower the timeouts during which connections are in FIN-WAIT-2 state 159 so that they can be re-used faster and thus increase number of 160 simultaneous connections possible. 162 net.ipv4.tcp_fin_timeout = 164 2.6. Re-use sockets in TIME_WAIT state 166 Especially when running backend servers that are having edge servers 167 fronting them to the Internet, allow reuse of sockets in TIME_WAIT 168 state for new connections when it is safe from the network stack's 169 perspective. 171 net.ipv4.tcp_tw_reuse = 1 173 2.7. Give the the TCP stack enough memory 175 Systems meant to handle and serve a huge number of TCP connections at 176 high speeds need a significant amount of memory for TCP stack 177 buffers. On some systems you can tell the TCP stack what default 178 buffer sizes to use and how much they are allowed to dynamically grow 179 and shrink. On a Linux system, you can control it like: 181 net.ipv4.tcp_wmem = 182 net.ipv4.tcp_rmem = 184 2.8. Set maximum allowed TCP window sizes 186 You may have to increase the largest allowed window size. 188 net.core.rmem_max = 189 net.core.wmem_max = 191 2.9. Timers and time-outs 193 Fail fast. Do not allow very long time-outs. Wasting several 194 minutes for various network related attempts won't make any users 195 happy. 197 Avoid long-going TCP flows that are (seemingly) idle. Use HTTP 198 continuations instead, or redirects, 202s or similar. 200 3. TCP handshake 202 3.1. TCP Fast Open 204 TCP Fast Open (a.k.a. TFO, [RFC7413]) allows data to be sent on the 205 TCP handshake, thereby allowing a request to be sent without any 206 delay if a connection is not open. 208 TFO requires both client and server support, and additionally 209 requires application knowledge, because the data sent on the SYN 210 needs to be idempotent. Therefore, TFO can only be used on 211 idempotent, safe HTTP methods (e.g., GET and HEAD), or with 212 intervening negotiation (e.g, using TLS). 214 Support for TFO is growing in client platforms, especially mobile, 215 due to the significant performance advantage it gives. 217 3.2. Initial Congestion Window 219 [RFC6928] specifies an initial congestion window of 10, and is now 220 fairly widely deployed server-side. There has been experimentation 221 with larger initial windows, in combination with packet pacing. 223 IW10 has been reported to perform fairly well even in high volume 224 servers. 226 3.3. TCP SYN flood handling 228 TCP SYN Flood mitigations [RFC4987] are necessary and there will be 229 thresholds to tweak. 231 4. TCP transfers 232 4.1. Packet Pacing 234 TBD 236 4.2. Explicit Congestion Control 238 Apple deploying in iOS and OSX [1]. 240 4.3. Nagle's Algorithm 242 Nagle's Algorithm [RFC0896] is the mechanism that makes the TCP stack 243 hold (small) outgoing packets for a short period of time so that it 244 can potentially merge that packet with the next outgoing one. It is 245 optimized for throughput at the expense of latency. 247 HTTP/2 in particular requires that the client can send a packet back 248 fast even during transfers that are perceived as single direction 249 transfers. Even small delays in those sends can cause a significant 250 performance loss. 252 HTTP/1.1 is also affected, especially when sending off a full request 253 in a single write() system call. 255 In POSIX systems you switch it off like this: 257 int one = 1; 258 setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &one, sizeof(one)); 260 4.4. Keep-alive 262 TCP keep-alive is likely disabled - at least on mobile clients for 263 energy saving purposes. App-level keep-alive is then required for 264 long-lived requests to detect failed peers or connections reset by 265 stateful firewalls etc. 267 5. Re-using connections 269 5.1. Slow Start after Idle 271 Slow-start is one of the algorithms that TCP uses to control 272 congestion inside the network. It is also known as the exponential 273 growth phase. Each TCP connection will start off in slow-start but 274 will also go back to slow-start after a certain amount of idle time. 276 In Linux systems you can prevent the TCP stack from going back to 277 slow-start after idle by settting 279 net.ipv4.tcp_slow_start_after_idle = 0 281 5.2. TCP-Bound Authentications 283 There are several HTTP authentication mechanisms in use today that 284 are used or can be used to authenticate a connection rather than a 285 single HTTP request. Two popular ones are NTLM and Negotiate. 287 If such an authentication has been negotiated on a TCP connection, 288 that connection can remain authenticated throughout the rest of its 289 life time. This discrepancy with how other HTTP authentications work 290 makes it important to handle these connections with care. 292 6. Closing connections 294 6.1. Half-close 296 Client or server is free to half-close after a request or response 297 has been completed; or when there is no pending stream in HTTP/2. 299 Half-closing is sometimes the only way for a server to make sure it 300 closes down connections cleanly so that it doesn't accept more 301 requests while still allowing clients to receive the ongoing 302 responses. 304 6.2. Abort 306 No client abort for HTTP/1.1 after the request body has been sent. 307 Delayed full close is expected following an error response to avoid 308 RST on the client. 310 6.3. Close Idle Connections 312 Keeping open connections around for subsequent connection re-use is 313 key for many HTTP clients' performance. The value of an existing 314 connection quickly degrades and already after a few minutes the 315 chance that a connection will successfully get re-used by a web 316 browser is slim. 318 6.4. Tail Loss Probes 320 draft [2] 322 7. IANA Considerations 324 This document does not require action from IANA. 326 8. Security Considerations 328 TBD 330 9. References 332 9.1. Normative References 334 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 335 793, DOI 10.17487/RFC0793, September 1981, 336 . 338 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 339 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ 340 RFC2119, March 1997, 341 . 343 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 344 Protocol (HTTP/1.1): Message Syntax and Routing", RFC 345 7230, DOI 10.17487/RFC7230, June 2014, 346 . 348 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 349 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, DOI 350 10.17487/RFC7540, May 2015, 351 . 353 9.2. Informative References 355 [RFC0896] Nagle, J., "Congestion Control in IP/TCP Internetworks", 356 RFC 896, DOI 10.17487/RFC0896, January 1984, 357 . 359 [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common 360 Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, 361 . 363 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 364 "Increasing TCP's Initial Window", RFC 6928, DOI 10.17487/ 365 RFC6928, April 2013, 366 . 368 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 369 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 370 . 372 9.3. URIs 374 [1] https://developer.apple.com/videos/wwdc/2015/?id=719 376 [2] http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 378 Appendix A. Acknowledgments 380 This specification builds upon previous work and help from Mark 381 Nottingham, Craig Taylor 383 Author's Address 385 Daniel Stenberg 386 Mozilla 388 Email: daniel@haxx.se 389 URI: http://daniel.haxx.se