idnits 2.17.1 draft-pauly-taps-guidelines-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 7 instances of too long lines in the document, the longest one being 3 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 211: '...iple options, it SHOULD logically stru...' RFC 2119 keyword, line 285: '...node of the tree MUST only use one typ...' RFC 2119 keyword, line 297: '...rk, the implementation SHOULD send DNS...' RFC 2119 keyword, line 300: '... these addresses SHOULD follow the rec...' RFC 2119 keyword, line 384: '... SHOULD maintain a history of which ...' (30 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 24, 2017) is 2369 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-07 == Outdated reference: A later version (-28) exists of draft-ietf-tls-tls13-21 == Outdated reference: A later version (-07) exists of draft-ietf-v6ops-rfc6555bis-06 -- Obsolete informational reference (is this intentional?): RFC 7540 (Obsoleted by RFC 9113) Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Pauly 3 Internet-Draft Apple Inc. 4 Intended status: Standards Track October 24, 2017 5 Expires: April 27, 2018 7 Guidelines for Racing During Connection Establishment 8 draft-pauly-taps-guidelines-01 10 Abstract 12 Often, connections created across the Internet have multiple options 13 of how to communicate: address families, specific IP addresses, 14 network attachments, and application and transport protocols. This 15 document describes how an implementation can race multiple options 16 during connection establishment, and expose this functionality 17 through an API. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on April 27, 2018. 36 Copyright Notice 38 Copyright (c) 2017 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2.1. Endpoint . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2.2. Derived Endpoint . . . . . . . . . . . . . . . . . . . . 3 57 2.3. Path . . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2.4. Connection . . . . . . . . . . . . . . . . . . . . . . . 4 59 3. Connection Establishment Overview . . . . . . . . . . . . . . 4 60 4. Structuring Options as a Tree . . . . . . . . . . . . . . . . 5 61 4.1. Branch Types . . . . . . . . . . . . . . . . . . . . . . 7 62 4.1.1. Derived Endpoints . . . . . . . . . . . . . . . . . . 7 63 4.1.2. Alternate Paths . . . . . . . . . . . . . . . . . . . 7 64 4.1.3. Protocol Options . . . . . . . . . . . . . . . . . . 8 65 4.2. Branching Order-of-Operations . . . . . . . . . . . . . . 9 66 5. Connection Establishment Dynamics . . . . . . . . . . . . . . 10 67 5.1. Building the Tree . . . . . . . . . . . . . . . . . . . . 10 68 5.2. Racing Methods . . . . . . . . . . . . . . . . . . . . . 11 69 5.2.1. Delayed Racing . . . . . . . . . . . . . . . . . . . 11 70 5.2.2. Failover . . . . . . . . . . . . . . . . . . . . . . 12 71 5.3. Completing Establishment . . . . . . . . . . . . . . . . 12 72 5.3.1. Determining Successful Establishment . . . . . . . . 13 73 6. API Considerations . . . . . . . . . . . . . . . . . . . . . 14 74 6.1. Handling 0-RTT Data . . . . . . . . . . . . . . . . . . . 14 75 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15 76 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 77 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 78 10. Informative References . . . . . . . . . . . . . . . . . . . 16 79 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 17 81 1. Introduction 83 Often, connections created across the Internet have multiple options 84 of how to communicate: address families, specific IP addresses, 85 network attachments, and application and transport protocols. If an 86 application chooses to only attempt one of these options, it may fail 87 to connect, or end up using a suboptimal path. If an application 88 chooses to attempt one option after another, waiting for each to fail 89 or time out, a user of the application may need to wait for a very 90 long time before progress is made. And, if an application 91 simultaneously attempts all options, it may unnecessarily consume 92 significant local or network resources. 94 In order to solve this, applications can employ a method of racing 95 their various connection establishment options. This approach is 96 commonly used for racing multiple IP address families, the algorithm 97 for which is referred to as "Happy Eyeballs" 98 [I-D.ietf-v6ops-rfc6555bis]. However, the approach can apply more 99 generally. 101 This document describes how an implementation can race multiple 102 options during connection establishment, and expose this 103 functionality through an API. 105 2. Terminology 107 This document uses specific terminology when discussing connection 108 establishment. 110 2.1. Endpoint 112 An identifier for a network service. Generally there is a concept of 113 both a local and remote endpoint. Endpoints are the targets of 114 network connections. If an endpoint of a given type cannot be 115 directly used, it should be resolved into one or more endpoints of 116 another type. Examples of endpoint types include: 118 o IP address + port 120 o Hostname + port 122 o Service name + type + domain 124 o URI 126 2.2. Derived Endpoint 128 A derived endpoint is an endpoint that is not the original target of 129 an API client, but an endpoint created from the original endpoint 130 through transformation or lookup. Derivation may take the form of 131 hostname resolution into addresses, synthesis between address types, 132 or changing to a different endpoint entirely based on a configuration 133 requirement. For example, if a proxy server must be used for a 134 connection, the endpoint that represents the proxy is a derived 135 endpoint. 137 2.3. Path 139 A view of network properties that can be used to communicate to an 140 endpoint from the current system. This is sometimes referred to as a 141 Provisioning Domain (PvD) [RFC7556]. The path may include properties 142 of the addresses and routes being used, the network interfaces being 143 used, and other metadata about the network learned from configuration 144 or negotiation. 146 2.4. Connection 148 A flow of data between two endpoints. A connection is created with a 149 target remote endpoint, and a set of parameters indicating client 150 preferences for path selection and protocol options. 152 3. Connection Establishment Overview 154 The process of establishing a network connection begins when an 155 application expresses intent to communicate with a remote endpoint 156 (along with any constraints or requirements it may have on the 157 connection). The process can be considered complete once there is at 158 least one set of network protocols that have completed any required 159 setup to the point that it can transmit and receive the application's 160 data. 162 Looking more closely, connection establishment has three required 163 steps that must be performed by some entity on a system: 165 1. Identifying the endpoint to which the connection should be 166 established 168 2. Choosing which path or interface to use 170 3. Conducting the necessary set of protocol handshakes to establish 171 the connection 173 The most simple example of this process might involve identifying the 174 single IP address to which the application wishes to connect, using 175 the system's current default interface or path, and starting a TCP 176 handshake to establish a stream to the specified IP address. 177 However, each step may also vary depending on the requirements of the 178 connection: if the endpoint is defined as a hostname and port, then 179 there may be multiple resolved addresses that are available; there 180 may also be multiple interfaces or paths available, other than the 181 default system interface; and some protocols may not need any 182 transport handshake to be considered "established" (such as UDP), 183 while other connections may utilize layered protocol handshakes, such 184 as TLS over TCP. 186 Whenever an application has multiple options for connection 187 establishment, it can view the set of all individual connection 188 establishment options as a single, aggregate connection 189 establishment. The aggregate set conceptually includes every valid 190 combination of endpoints, paths, and protocols. As an example, 191 consider an application that initiates a TCP connection to a hostname 192 + port endpoint, and has two valid interfaces available (Wi-Fi and 193 LTE). The hostname resolves to a single IPv4 address on the Wi-Fi 194 network, and resolves to the same IPv4 address on the LTE network, as 195 well as a single IPv6 address. The aggregate set of connection 196 establishment options can be viewed as follows: 198 Aggregate [Endpoint: www.example.com:80] [Interface: Any] [Protocol: TCP] 199 |-> [Endpoint: 192.0.2.1:80] [Interface: Wi-Fi] [Protocol: TCP] 200 |-> [Endpoint: 192.0.2.1:80] [Interface: LTE] [Protocol: TCP] 201 |-> [Endpoint: 2001:DB8::1.80] [Interface: LTE] [Protocol: TCP] 203 Any one of these sub-entries on the aggregate connection attempt 204 would satisfy the original application intent. The concern of this 205 document is the algorithm defining which of these options to try, 206 when, and in what order. 208 4. Structuring Options as a Tree 210 When an implementation responsible for connection establishment needs 211 to consider multiple options, it SHOULD logically structure these 212 options as a hierarchical tree. Each leaf node of the tree 213 represents a single, coherent connection attempt, with an Endpoint, a 214 Path, and a set of protocols that can directly negotiate and send 215 data on the network. Each node in the tree that is not a leaf 216 represents a connection attempt that is either underspecified, or 217 else includes multiple distinct options. For example. when 218 connecting on an IP network, a connection attempt to a hostname and 219 port is underspecified, because the connection attempt requires a 220 resolved IP address as its remote endpoint. In this case, the node 221 represented by the connection attempt to the hostname is a parent 222 node, with child nodes for each IP address. Similarly, an 223 application that is allowed to connect using multiple interfaces will 224 have a parent node of the tree for the decision between the paths, 225 with a branch for each interface. 227 The example aggregate connection attempt above can be drawn as a tree 228 by grouping the addresses resolved on the same interface into 229 branches: 231 || 232 +==========================+ 233 | www.example.com:80/Any | 234 +==========================+ 235 // \\ 236 +==========================+ +==========================+ 237 | www.example.com:80/Wi-Fi | | www.example.com:80/LTE | 238 +==========================+ +==========================+ 239 || // \\ 240 +====================+ +====================+ +======================+ 241 | 192.0.2.1:80/Wi-Fi | | 192.0.2.1:80/LTE | | 2001:DB8::1.80/LTE | 242 +====================+ +====================+ +======================+ 244 The rest of this document will use a notation scheme to represent 245 this tree. The parent (or trunk) node of the tree will be 246 represented by a single integer, such as "1". Each child of that 247 node will have an integer that identifies it, from 1 to the number of 248 children. That child node will be uniquely identified by 249 concatenating its integer to it's parents identifier with a dot in 250 between, such as "1.1" and "1.2". Each node will be summarized by a 251 tuple of three elements: Endpoint, Path, and Protocol. The above 252 example can now be written more succinctly as: 254 1 [www.example.com:80, Any, TCP] 255 1.1 [www.example.com:80, Wi-Fi, TCP] 256 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 257 1.2 [www.example.com:80, LTE, TCP] 258 1.2.1 [192.0.2.1:80, LTE, TCP] 259 1.2.2 [2001:DB8::1.80, LTE, TCP] 261 When an application views this aggregate set of connection attempts 262 as a single connection establishment, it only will use one of the 263 leaf nodes to transfer data. Thus, when a single leaf node becomes 264 ready to use, then the entire connection attempt is ready to use by 265 the application. Another way to represent this is that every leaf 266 node updates the state of its parent node when it becomes ready, 267 until the trunk node of the tree is ready, which then notifies the 268 application that the connection as a whole is ready to use. 270 A connection establishment tree may be degenerate, and only have a 271 single leaf node, such as a connection attempt to an IP address over 272 a single interface with a single protocol. 274 1 [192.0.2.1:80, Wi-Fi, TCP] 276 A parent node may also only have one child (or leaf) node, such as a 277 when a hostname resolves to only a single IP address. 279 1 [www.example.com:80, Wi-Fi, TCP] 280 1.1 [192.0.2.1:80, Wi-Fi, TCP] 282 4.1. Branch Types 284 There are three types of branching from a parent node into one or 285 more child nodes. Any parent node of the tree MUST only use one type 286 of branching. 288 4.1.1. Derived Endpoints 290 If a connection originally targets a single endpoint, there may be 291 multiple endpoints of different types that can be derived from the 292 original. The connection library should order the derived endpoints 293 according to application preference and expected performance. 295 DNS hostname-to-address resolution is the most common method of 296 endpoint derivation. When trying to connect to a hostname endpoint 297 on a traditional IP network, the implementation SHOULD send DNS 298 queries for both A (IPv4) and AAAA (IPv6) records if both are 299 supported on the local link. The algorithm for ordering and racing 300 these addresses SHOULD follow the recommendations in Happy Eyeballs 301 [I-D.ietf-v6ops-rfc6555bis]. 303 1 [www.example.com:80, Wi-Fi, TCP] 304 1.1 [2001:DB8::1.80, Wi-Fi, TCP] 305 1.2 [192.0.2.1:80, Wi-Fi, TCP] 306 1.3 [2001:DB8::2.80, Wi-Fi, TCP] 307 1.4 [2001:DB8::3.80, Wi-Fi, TCP] 309 DNS-Based Service Discovery can also provide an endpoint derivation 310 step. When trying to connect to a named service, the client may 311 discover one or more hostname and port pairs on the local network 312 using multicast DNS. These hostnames should each be treated as a 313 branch which can be attempted independently from other hostnames. 314 Each of these hostnames may also resolve to one or more addresses, 315 thus creating multiple layers of branching. 317 1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP] 318 1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP] 319 1.1.1 [31.133.160.18.631, Wi-Fi, TCP] 321 4.1.2. Alternate Paths 323 If a client has multiple network interfaces available to it, such as 324 mobile client with both Wi-Fi and Cellular connectivity, it can 325 attempt a connection over either interface. This represents a branch 326 point in the connection establishment. Like with derived endpoints, 327 the interfaces should be ranked based on preference, system policy, 328 and performance. Attempts should be started on one interface, and 329 then on other interfaces successively after delays based on expected 330 round-trip-time or other available metrics. 332 1 [192.0.2.1:80, Any, TCP] 333 1.1 [192.0.2.1:80, Wi-Fi, TCP] 334 1.2 [192.0.2.1:80, LTE, TCP] 336 This same approach applies to any situation in which the client is 337 aware of multiple links or views of the network. Multiple Paths, 338 each with a coherent set of addresses, routes, DNS server, and more, 339 may share a single interface. A path may also represent a virtual 340 interface service such as a Virtual Private Network (VPN). 342 The list of available paths should be constrained by any requirements 343 or prohibitions the application sets, as well as system policy. 345 4.1.3. Protocol Options 347 Differences in possible protocol compositions and options can also 348 provide a branching point in connection establishment. This allows 349 clients to be resilient to situations in which a certain protocol is 350 not functioning on a server or network. 352 This approach is commonly used for connections with optional proxy 353 server configurations. A single connection may be allowed to use an 354 HTTP-based proxy, a SOCKS-based proxy, or connect directly. These 355 options should be ranked and attempted in succession. 357 1 [www.example.com:80, Any, HTTP/TCP] 358 1.1 [192.0.2.8:80, Any, HTTP/HTTP Proxy/TCP] 359 1.2 [192.0.2.7:10234, Any, HTTP/SOCKS/TCP] 360 1.3 [www.example.com:80, Any, HTTP/TCP] 361 1.3.1 [192.0.2.1:80, Any, HTTP/TCP] 363 This approach also allows a client to attempt different sets of 364 application and transport protocols that may provide preferable 365 characteristics when available. For example, the protocol options 366 could involve QUIC [I-D.ietf-quic-transport] over UDP on one branch, 367 and HTTP/2 [RFC7540] over TLS over TCP on the other: 369 1 [www.example.com:443, Any, Any HTTP] 370 1.1 [www.example.com:443, Any, QUIC/UDP] 371 1.1.1 [192.0.2.1:443, Any, QUIC/UDP] 372 1.2 [www.example.com:443, Any, HTTP2/TLS/TCP] 373 1.2.1 [192.0.2.1:443, Any, HTTP2/TLS/TCP] 375 Another example is racing SCTP with TCP: 377 1 [www.example.com:80, Any, Any Stream] 378 1.1 [www.example.com:80, Any, SCTP] 379 1.1.1 [192.0.2.1:80, Any, SCTP] 380 1.2 [www.example.com:80, Any, TCP] 381 1.2.1 [192.0.2.1:80, Any, TCP] 383 Implementations that support racing protocols and protocol options 384 SHOULD maintain a history of which protocols and protocol options 385 successfully established, on a per-network basis. This information 386 can influence future racing decisions to prioritize or prune 387 branches. 389 4.2. Branching Order-of-Operations 391 Branch types must occur in a specific order relative to one another 392 to avoid creating leaf nodes with invalid or incompatible settings. 393 In the example above, it would be invalid to branch for derived 394 endpoints (the DNS results for www.example.com) before branching 395 between interface paths, since usable DNS results on one network may 396 not necessarily be the same as DNS results on another network due to 397 local network entities, supported address families, or enterprise 398 network configurations. Implementations must be careful to branch in 399 an order that results in usable leaf nodes whenever there are 400 multiple branch types that could be used from a single node. 402 The order of operations for branching, where lower numbers are acted 403 upon first, SHOULD be: 405 1. Alternate Paths 407 2. Protocol Options 409 3. Derived Endpoints 411 Branching between paths is the first in the list because results 412 across multiple interfaces are likely not related to one another: 413 endpoint resolution may return different results, especially when 414 using locally resolved host and service names, and which protocols 415 are supported and preferred may differ across interfaces. Thus, if 416 multiple paths are attempted, the overall connection can be seen as a 417 race between the available paths or interfaces. 419 Protocol options are checked next in order. Whether or not a set of 420 protocol, or protocol-specific options, can successfully connect is 421 generally not dependent on which specific IP address is used. 422 Furthermore, the protocol stacks being attempted may influence or 423 altogether change the endpoints being used. Adding a proxy to a 424 connection's branch will change the endpoint to the proxy's IP 425 address or hostname. Choosing an alternate protocol may also modify 426 the ports that should be selected. 428 Branching for derived endpoints is the final step, and may have 429 multiple layers of derivation or resolution, such as DNS service 430 resolution and DNS hostname resolution. 432 5. Connection Establishment Dynamics 434 The primary goal of the connection establishment process is to 435 successfully negotiate a protocol stack to an endpoint over an 436 interface--to connect a single leaf node of the tree--with as little 437 delay and as few unnecessary connections attempts as possible. 438 Optimizing these two factors improves the user experience, while 439 minimizing network load. 441 This section covers the dynamic aspect of connection establishment. 442 While the tree described above is a useful conceptual and 443 architectural model, an implementation does not know what the full 444 tree may become up front, nor will many of the possible branches be 445 used in the common case. 447 5.1. Building the Tree 449 The tree of options is built dynamically, out from the original trunk 450 node. Any time that a connection attempt may be made directly to an 451 endpoint without further derivation, and without needing to try 452 alternate paths or protocol options that have not yet been covered by 453 previous branches, the implementation SHOULD treat this as a leaf 454 node and connect directly. Any time that an implementation chooses 455 to branch between multiple options, it SHOULD determine a preferred 456 order between the child nodes based on system policy, expected or 457 historical performance, and application preference. 459 When multiple paths are available, and permitted by the system's 460 policy, the implementation SHOULD branch between the various paths. 461 The list SHOULD be sorted based on the system policies and routes 462 (which often determine a "default" interface), preferences expressed 463 by the application, and expected performance based on measured or 464 advertised properties of each path. 466 When multiple protocol options are allowed by an application, and the 467 system and implementation identify valid sets of protocols and 468 protocol options, the implementation SHOULD branch between these 469 sets. This list SHOULD be sorted based on application preference and 470 expected performance, generally measured in terms of latency and 471 bandwidth. 473 An implementation will only branch to derive endpoints when 474 necessary. This step involves the most external information, as 475 endpoint derivation is often a process that requires fetching 476 information from the network. Before branching, an implementation 477 must first generate the list of derived endpoints. Once this list is 478 sufficiently populated to continue, the implementation SHOULD sort 479 the list based on preference and expected performance. When these 480 derived endpoints are IP addresses, implementations SHOULD use the 481 algorithm in [RFC6724] to sort the addresses. In cases where 482 additional information can become available after the initial tree 483 has been constructed, the implementation SHOULD update the tree to 484 reflect new information and orderings if none of the leaf nodes are 485 fully established. 487 5.2. Racing Methods 489 There are three different approaches to racing the attempts for 490 different nodes of the connection establishment tree: 492 1. Immediate 494 2. Delayed 496 3. Failover 498 Each approach is appropriate in different use-cases and branch types. 499 However, to avoid consuming unnecessary network resources, 500 implementations SHOULD NOT use immediate racing as a default 501 approach. 503 The timing algorithms for racing SHOULD remain independent across 504 branches of the tree. Any timers or racing logic is isolated to a 505 given parent node, and is not ordered precisely with regards to other 506 children of other nodes. 508 5.2.1. Delayed Racing 510 Delayed racing can be used whenever a single node of the tree has 511 multiple child nodes. Based on the order determined when building 512 the tree, the first child node will be initiated immediately, 513 followed by the next child node after some delay. Once that second 514 child node is initiated, the third child node (if present) will begin 515 after another delay, and so on until all child nodes have been 516 initiated, or one of the child nodes successfully completes its 517 negotiation. 519 Delayed racing attempts occur in parallel. Implementations SHOULD 520 NOT terminate an earlier child connection attempt upon starting a 521 secondary child. 523 The delay between starting child nodes SHOULD be based on the 524 properties of the previously started child node. For example, if the 525 first child represents an IP address with a known route, and the 526 second child represents another IP address, the delay between 527 starting the first and second IP addresses can be based on the 528 expected retransmission cadence for the first child's connection 529 (derived from historical round-trip-time). Alternatively, if the 530 first child represents a branch on a Wi-Fi interface, and the second 531 child represents a branch on an LTE interface, the delay should be 532 based on the expected time in which the branch for the first 533 interface would be able to establish a connection, based on link 534 quality and historical round-trip-time. 536 Any delay SHOULD have a defined minimum and maximum value based on 537 the branch type. Generally, branches between paths and protocols 538 should have longer delays than branches between derived endpoints. 539 The maximum delay should be considered with regards to how long a 540 user is expected to wait for the connection to complete. 542 If a child node fails to connect before the delay timer has fired for 543 the next child, the next child SHOULD be started immediately. 545 5.2.2. Failover 547 If an implementation or application has a strong preference for one 548 branch over another, the branching node may choose to wait until one 549 child has failed before starting the next. Failure of a leaf node is 550 determined by its protocol negotiation failing or timing out; failure 551 of a parent branching node is determined by all of its children 552 failing. 554 An example in which failover is recommended is a race between a 555 protocol stack that uses a proxy and a protocol stack that bypasses 556 the proxy. Failover is useful in case the proxy is down or 557 misconfigured, but any more aggressive type of racing may end up 558 unnecessarily avoiding a proxy that was preferred by policy. 560 5.3. Completing Establishment 562 The process of connection establishment completes when one leaf node 563 of the tree has completed negotiation with the remote endpoint 564 successfully, or else all nodes of the tree have failed to connect. 565 The first leaf node to complete its connection is then used by the 566 application to send and receive data. 568 It is useful to process success and failure throughout the tree by 569 child nodes reporting to their parent nodes (towards the trunk of the 570 tree). For example, in the following case, if 1.1.1 fails to 571 connect, it reports the failure to 1.1. Since 1.1 has no other child 572 nodes, it also has failed and reports that failure to 1. Because 1.2 573 has not yet failed, 1 is not considered to have failed. Since 1.2 574 has not yet started, it is started and the process continues. 575 Similarly, if 1.1.1 successfully connects, then it marks 1.1 as 576 connected, which propagates to the trunk node 1. At this point, the 577 connection as a whole is considered to be successfully connected and 578 ready to process application data 580 1 [www.example.com:80, Any, TCP] 581 1.1 [www.example.com:80, Wi-Fi, TCP] 582 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 583 1.2 [www.example.com:80, LTE, TCP] 584 ... 586 If a leaf node has successfully completed its connection, all other 587 attempts SHOULD be made ineligible for use by the application for the 588 original request. New connection attempts that involve transmitting 589 data on the network SHOULD NOT be started after another leaf node has 590 completed successfully, as the connection as a whole has been 591 established. An implementation MAY choose to let certain handshakes 592 and negotiations complete in order to gather metrics to influence 593 future connections. Similarly, an implementation MAY choose to hold 594 onto fully established leaf nodes that were not the first to 595 establish for use in future connections, but this approach is not 596 recommended since those attempts were slower to connect and may 597 exhibit less desirable properties. 599 5.3.1. Determining Successful Establishment 601 Implementations may select the criteria by which a leaf node is 602 considered to be successfully connected differently on a per-protocol 603 basis. If the only protocol being used is a transport protocol with 604 a clear handshake, like TCP, then the obvious choice is to declare 605 that node "connected" when the last packet of the three-way handshake 606 has been received. If the only protocol being used is an 607 "unconnected" protocol, like UDP, the implementation may consider the 608 node fully "connected" the moment it determines a route is present, 609 before sending any packets on the network. 611 For protocol stacks with multiple handshakes, the decision becomes 612 more nuanced. If the protocol stack involves both TLS and TCP, an 613 implementation MAY determine that a leaf node is connected after the 614 TCP handshake is complete, or it MAY wait for the TLS handshake to 615 complete as well. The benefit of declaring completion when the TCP 616 handshake finishes, and thus stopping the race for other branches of 617 the tree, is that there will be less burden on the network from other 618 connection attempts. On the other hand, by waiting until the TLS 619 handshake is complete, an implementation avoids the scenario in which 620 a TCP handshake completes quickly, but TLS negotiation is either very 621 slow or fails altogether in particular network conditions or to a 622 particular endpoint. 624 6. API Considerations 626 In general, the internal states and nodes of racing connection 627 establishment do not need to be exposed to applications. Instead, 628 this process SHOULD be treated as an abstraction of a single, 629 aggregate connection establishment behind an API. This places some 630 requirements on the API, including: 632 o The API must allow the application to specify an un-resolved 633 endpoint as the remote side of the connection, such as a URI or 634 hostname + port. The application also should be able to provide 635 constraints on path selection and protocol features. 637 o Any read or write operations cannot take effect until one leaf 638 node has been chosen as the connected node. The API needs to 639 either expose asynchronous reads and writes, or else prohibit 640 reads and writes until the connection is established. 642 o The action of starting or initiating the connection may involve 643 many network-bound operations, so this operation SHOULD be 644 asynchronous. 646 o Properties of the connection, such as the remote and local 647 addresses, the interface used, and the protocols used, may not be 648 queryable until the connection is established. 650 6.1. Handling 0-RTT Data 652 Several protocols allow sending higher-level protocol or application 653 data within the first packet of their protocol establishment, such as 654 TCP Fast Open [RFC7413] and TLS 1.3 [I-D.ietf-tls-tls13]. This 655 approach is referred to as sending Zero-RTT (0-RTT) data. This is a 656 desirable property, but poses challenges to an implementation that 657 uses racing during connection establishment. 659 If the application has 0-RTT data to send in any protocol handshakes, 660 it needs to provide this data before the handshakes have begun. When 661 racing, this means that the data SHOULD be provided before the 662 process of connection establishment has begun. If the API allows the 663 application to send 0-RTT data, it MUST provide an interface that 664 identifies this data as idempotent data. In general, 0-RTT data may 665 be replayed (for example, if a TCP SYN contains data, and the SYN is 666 retransmitted, the data will be retransmitted as well), but racing 667 means that different leaf nodes have the opportunity to send the same 668 data independently. If data is truly idempotent, this should be 669 permissible. 671 Once the application has provided its 0-RTT data, an implementation 672 SHOULD keep a copy of this data and provide it to each new leaf node 673 that is started and for which a 0-RTT protocol is being used. 675 It is also possible that protocol stacks within a particular leaf 676 node use 0-RTT handshakes without any idempotent application data. 677 For example, TCP Fast Open could use a Client Hello from a TLS as its 678 0-RTT data, shortening the cumulative handshake time. 680 0-RTT handshakes often rely on previous state, such as TCP Fast Open 681 cookies, previously established TLS tickets, or out-of-band 682 distributed pre-shared keys (PSKs). Implementations should be aware 683 of security concerns around using these tokens across multiple 684 addresses or paths when racing. In the case of TLS, any given ticket 685 or PSK SHOULD only be used on one leaf node. If implementations have 686 multiple tickets available from a previous connection, each leaf node 687 attempt MUST use a different ticket. In effect, each leaf node will 688 send the same early application data, yet encoded (encrypted) 689 differently on the wire. 691 7. Security Considerations 693 See Section 6.1 for security considerations around racing with 0-RTT 694 data. 696 An attacker that knows a particular device is racing several options 697 during connection establishment may be able to block packets for the 698 first connection attempt, thus inducing the device to fall back to a 699 secondary attempt. This is a problem if the secondary attempts have 700 worse security properties that enable further attacks. 701 Implementations should ensure that all options have equivalent 702 security properties to avoid incentivizing attacks. 704 Since results from the network can determine how a connection attempt 705 tree is built, such as when DNS returns a list of resolved endpoints, 706 it is possible for the network to cause an implementation to consume 707 significant on-device resources. Implementations SHOULD limit the 708 maximum amount of state allowed for any given node, including the 709 number of child nodes, especially when the state is based on results 710 from the network. 712 8. IANA Considerations 714 This document has no request to IANA. 716 9. Acknowledgments 718 Thanks to Josh Graessley and Stuart Cheshire for their help in the 719 design of the original implementation of Happy Eyeballs for Apple 720 that began this work. 722 10. Informative References 724 [I-D.ietf-quic-transport] 725 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 726 and Secure Transport", draft-ietf-quic-transport-07 (work 727 in progress), October 2017. 729 [I-D.ietf-tls-tls13] 730 Rescorla, E., "The Transport Layer Security (TLS) Protocol 731 Version 1.3", draft-ietf-tls-tls13-21 (work in progress), 732 July 2017. 734 [I-D.ietf-v6ops-rfc6555bis] 735 Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: 736 Better Connectivity Using Concurrency", draft-ietf-v6ops- 737 rfc6555bis-06 (work in progress), October 2017. 739 [RFC6724] Thaler, D., Ed., Draves, R., Matsumoto, A., and T. Chown, 740 "Default Address Selection for Internet Protocol Version 6 741 (IPv6)", RFC 6724, DOI 10.17487/RFC6724, September 2012, 742 . 744 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 745 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 746 . 748 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 749 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 750 DOI 10.17487/RFC7540, May 2015, . 753 [RFC7556] Anipko, D., Ed., "Multiple Provisioning Domain 754 Architecture", RFC 7556, DOI 10.17487/RFC7556, June 2015, 755 . 757 Author's Address 759 Tommy Pauly 760 Apple Inc. 761 1 Infinite Loop 762 Cupertino, California 95014 763 United States of America 765 Email: tpauly@apple.com