idnits 2.17.1 draft-banks-quic-performance-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 23, 2020) is 1219 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group N. Banks 3 Internet-Draft Microsoft Corporation 4 Intended status: Experimental December 23, 2020 5 Expires: June 26, 2021 7 QUIC Performance 8 draft-banks-quic-performance-00 10 Abstract 12 The QUIC performance protocol provides a simple, general-purpose 13 protocol for testing the performance characteristics of a QUIC 14 implementation. With this protocol a generic server can support any 15 number of client-driven performance tests and configurations. 16 Standardizing the performance protocol allows for easy comparisons 17 across different QUIC implementations. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on June 26, 2021. 36 Copyright Notice 38 Copyright (c) 2020 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (https://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 1.1. Terms and Definitions . . . . . . . . . . . . . . . . . . 3 55 2. Specification . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2.1. Protocol Negotiation . . . . . . . . . . . . . . . . . . 3 57 2.2. Configuration . . . . . . . . . . . . . . . . . . . . . . 3 58 2.3. Streams . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 2.3.1. Encoding Server Response Size . . . . . . . . . . . . 4 60 2.3.2. Bidirectional vs Unidirectional Streams . . . . . . . 4 61 3. Example Performance Scenarios . . . . . . . . . . . . . . . . 4 62 3.1. Single Connection Bulk Throughput . . . . . . . . . . . . 4 63 3.2. Requests Per Second . . . . . . . . . . . . . . . . . . . 5 64 3.3. Handshakes Per Second . . . . . . . . . . . . . . . . . . 6 65 3.4. Throughput Fairness Index . . . . . . . . . . . . . . . . 6 66 3.5. Maximum Number of Idle Connections . . . . . . . . . . . 7 67 4. Things to Note . . . . . . . . . . . . . . . . . . . . . . . 7 68 4.1. What Data Should be Sent? . . . . . . . . . . . . . . . . 7 69 4.2. Ramp up Congestion Control or Not? . . . . . . . . . . . 7 70 4.3. Disabling Encryption . . . . . . . . . . . . . . . . . . 7 71 5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 72 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 73 7. Normative References . . . . . . . . . . . . . . . . . . . . 8 74 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 76 1. Introduction 78 The various QUIC implementations are still quite young and not 79 exhaustively tested for many different performance heavy scenarios. 80 Some have done their own testing, but many are just starting this 81 process. Additionally, most only test the performance between their 82 own client and server. The QUIC performance protocol aims to 83 standardize the performance testing mechanisms. This will hopefully 84 achieve the following: 86 o Remove the need to redesign a performance test for each QUIC 87 implementation. 89 o Provide standard test cases that can produce performance metrics 90 that can be easily compared across different configurations and 91 implementations. 93 o Allow for easy cross-implementation performance testing. 95 1.1. Terms and Definitions 97 The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 98 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 99 "OPTIONAL" in this document are to be interpreted as described in BCP 100 14 [RFC2119] [RFC8174] when, and only when, they appear in all 101 capitals, as shown here. 103 2. Specification 105 The sections below describe the mechanisms used by a client to 106 connect to a QUIC perf server and execute various performance 107 scenarios. 109 2.1. Protocol Negotiation 111 The ALPN used by the QUIC performance protocol is "perf". It can be 112 used on any UDP port, but UDP port 443 is used by default, if no 113 other port is specified. No SNI is required to connect, but may be 114 optionally provided if the client wishes. 116 2.2. Configuration 118 TODO - Possible options: use the first stream to exchange 119 configurations data OR use a custom transport parameter. 121 2.3. Streams 123 The performance protocol is primarily centered around sending and 124 receiving data. Streams are the primary vehicle for this. All 125 performance tests are client-driven: 127 o The client opens a stream. 129 o The client encodes the size of the requested server response. 131 o The client sends any data it wishes to. 133 o The client cleanly closes the stream with a FIN. 135 When a server receives a stream does the following: 137 o The server accepts the new stream. 139 o The server processes the encoded response size. 141 o The server drains the rest of the client data. 143 o The server then sends any response payload that was requested. 145 *Note* - Should the server wait for FIN before replying? 147 2.3.1. Encoding Server Response Size 149 Every stream opened by the client uses the first 8 bytes of the 150 stream data to encode a 64-bit unsigned integer in network byte order 151 to indicate the length of data the client wishes the server to 152 respond with. An encoded value of zero is perfectly legal, and a 153 value of MAX_UINT64 (0xFFFFFFFFFFFFFFFF) is practically used to 154 indicate an unlimited server response. The client may then cancel 155 the transfer at its convenience with a STOP_SENDING frame. 157 On the server side, any stream that is closed before all 8 bytes are 158 received should just be ignored, and gracefully closed on its end (if 159 applicable). 161 2.3.2. Bidirectional vs Unidirectional Streams 163 When a client uses a bidirectional stream to request a response 164 payload from the server, the server sends the requested data on the 165 same stream. If no data is requested by the client, the server 166 merely closes its side of the stream. 168 When a client uses a unidirectional stream to request a response 169 payload from the server, the server opens a new unidirectional stream 170 to send the requested data. If no data is requested by the client, 171 the server need take no action. 173 3. Example Performance Scenarios 175 All stream payload based tests below can be achieved either with 176 bidirectional or unidirectional streams. Generally, the goal of all 177 these performance tests is to measure the maximum load that can be 178 achieved with the given QUIC implementation and hardware 179 configuration. To that end, the network is not expected to be the 180 bottleneck in any of these tests. To achieve that, the appropriate 181 network hardware must be used so as to not limit throughput. 183 3.1. Single Connection Bulk Throughput 185 Bulk data throughput on a single QUIC connection is probably the most 186 common metric when first discussing the performance of a QUIC 187 implementation. It uses only a single QUIC connection. It may be 188 either an upload or download. It can be of any desired length. 190 For an upload test, the client need only open a single stream, 191 encodes a zero server response size, sends the upload payload and 192 then closes (FIN) the stream. 194 For a download test, the client again opens a single stream, encodes 195 the server's response size (N bytes) and then closes the stream. 197 The total throughput rate is measured by the client, and is 198 calculated by dividing the total bytes sent or received by difference 199 in time from when the client created its initial stream to the time 200 the client received the server's FIN. 202 3.2. Requests Per Second 204 Another very common performance metric is calculating the maximum 205 requests per second that a QUIC server can handle. Unlike the bulk 206 throughput test above, this test generally requires many parallel 207 connections (possibly from multiple client machines) in order to 208 saturate the server properly. There are several variables that tend 209 to directly affect the results of this test: 211 o The number of parallel connections. 213 o The size of the client's request. 215 o The size of the server's response. 217 All of the above variables may be changed to measure the maximum RPS 218 in the given scenario. 220 The test starts with the client connecting all parallel connections 221 and waiting for them to be connected. It's recommended to wait an 222 additional couple of seconds for things to settle down. 224 The client then starts sending "requests" on each connection. 225 Specifically, the client should keep at least one request pending 226 (preferrably at least two) on each connection at all times. When a 227 request completes (receive server's FIN) the client should 228 immediately queue another request. 230 The client continues to do this for a configured period of time. 231 From my testing, ten seconds seems to be a good amount of time to 232 reach the steady state. 234 Finally, the client measures the maximum requests per second rate as 235 the total number of requests completed divided by the total execution 236 time of the requests phase of the connection (not including the 237 handshake and wait period). 239 3.3. Handshakes Per Second 241 Another metric that may reveal the connection setup efficiency is 242 handshakes per second. It lets multiple clients (possibly from 243 multiple machines) setup QUIC connections (then close them by 244 CONNECTION_CLOSE) with a single server. Variables that may 245 potentially affect the results are: 247 o The number of client machines. 249 o The number of connections a client can initialize in a second. 251 o The size of ClientHello (long list of supported ciphers, versions, 252 etc.). 254 All the variables may be changed to measure the maximum handshakes 255 per second in a given scenario. 257 The test starts with the multiple clients initializing connections 258 and waiting for them to be connected with the single server on the 259 other machine. It's recommended to wait an additional couple of 260 seconds for connections to settle down. 262 The clients will initialize as many connections as possible to 263 saturate the server. Once the client receive the handshake from the 264 server, it terminates the connection by sending a CONNECTION_CLOSE to 265 the server. The total handshakes per second are calculated by 266 dividing the time period by the total number of connections that have 267 successfully established during that time. 269 3.4. Throughput Fairness Index 271 Connection fairness is able to help us reveal how the throughput is 272 allocated among each connection. A way of doing it is to establish 273 multiple hundreds or thousands of concurrent connections and request 274 the same data block from a single server. Variables that have 275 potential impact on the results are: 277 o the size of the data being requested. 279 o the number of the concurrent connections. 281 The test starts with establishing several hundreds or thousands of 282 concurrent connections and downloading the same data block from the 283 server simultaneously. 285 The index of fairness is calculated using the complete time of each 286 connection and the size of the data block in [Jain's manner] 287 (https://www.cse.wustl.edu/~jain/atmf/ftp/af_fair.pdf). 289 Be noted that the relationship between fairness and whether the link 290 is saturated is uncertain before any test. Thus it is recommended 291 that both cases are covered in the test. 293 TODO: is it necessary that we also provide tests on latency fairness 294 in the multi-connection case? 296 3.5. Maximum Number of Idle Connections 298 TODO 300 4. Things to Note 302 There are a few important things to note when doing performance 303 testing. 305 4.1. What Data Should be Sent? 307 Since the goal here is to measure the efficiency of the QUIC 308 implementation and not any application protocol, the performance 309 application layer should be as light-weight as possible. To this 310 end, the client and server application layer may use a single 311 preallocated and initialized buffer that it queues to send when any 312 payload needs to be sent out. 314 4.2. Ramp up Congestion Control or Not? 316 When running CPU limited, and not network limited, performance tests 317 ideally we don't care too much about the congestion control state. 318 That being said, assuming the tests run for enough time, generally 319 congestion control should ramp up very quickly and not be a 320 measureable factor in the measurements that result. 322 4.3. Disabling Encryption 324 A common topic when talking about QUIC performance is the effect that 325 its encryption has. The draft-banks-quic-disable-encryption draft 326 specifies a way for encryption to be mutually negotiated to be 327 disabled so that an A:B test can be made to measure the "cost of 328 encryption" in QUIC. 330 5. Security Considerations 332 Since the performance protocol allows for a client to trivially 333 request the server to do a significant amount of work, it's generally 334 advisable not to deploy a server running this protocol on the open 335 internet. 337 One possible mitigation for unauthenticated clients generating an 338 unacceptable amount of work on the server would be to use client 339 certificates to authenticate the client first. 341 6. IANA Considerations 343 None 345 7. Normative References 347 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 348 Requirement Levels", BCP 14, RFC 2119, 349 DOI 10.17487/RFC2119, March 1997, 350 . 352 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 353 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 354 May 2017, . 356 Author's Address 358 Nick Banks 359 Microsoft Corporation 361 Email: nibanks@microsoft.com