idnits 2.17.1 

draft-pauly-taps-guidelines-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 7 instances of too long lines in the document, the longest one
     being 3 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 211: '...iple options, it SHOULD logically stru...'
     RFC 2119 keyword, line 285: '...node of the tree MUST only use one typ...'
     RFC 2119 keyword, line 297: '...rk, the implementation SHOULD send DNS...'
     RFC 2119 keyword, line 300: '... these addresses SHOULD follow the rec...'
     RFC 2119 keyword, line 384: '...   SHOULD maintain a history of which ...'
     (30 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 24, 2017) is 2369 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-34) exists of
     draft-ietf-quic-transport-07

  == Outdated reference: A later version (-28) exists of
     draft-ietf-tls-tls13-21

  == Outdated reference: A later version (-07) exists of
     draft-ietf-v6ops-rfc6555bis-06

  -- Obsolete informational reference (is this intentional?): RFC 7540
     (Obsoleted by RFC 9113)


     Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                           T. Pauly
3	Internet-Draft                                                Apple Inc.
4	Intended status: Standards Track                        October 24, 2017
5	Expires: April 27, 2018

7	         Guidelines for Racing During Connection Establishment
8	                   draft-pauly-taps-guidelines-01

10	Abstract

12	   Often, connections created across the Internet have multiple options
13	   of how to communicate: address families, specific IP addresses,
14	   network attachments, and application and transport protocols.  This
15	   document describes how an implementation can race multiple options
16	   during connection establishment, and expose this functionality
17	   through an API.

19	Status of This Memo

21	   This Internet-Draft is submitted in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF).  Note that other groups may also distribute
26	   working documents as Internet-Drafts.  The list of current Internet-
27	   Drafts is at http://datatracker.ietf.org/drafts/current/.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   This Internet-Draft will expire on April 27, 2018.

36	Copyright Notice

38	   Copyright (c) 2017 IETF Trust and the persons identified as the
39	   document authors.  All rights reserved.

41	   This document is subject to BCP 78 and the IETF Trust's Legal
42	   Provisions Relating to IETF Documents
43	   (http://trustee.ietf.org/license-info) in effect on the date of
44	   publication of this document.  Please review these documents
45	   carefully, as they describe your rights and restrictions with respect
46	   to this document.  Code Components extracted from this document must
47	   include Simplified BSD License text as described in Section 4.e of
48	   the Trust Legal Provisions and are provided without warranty as
49	   described in the Simplified BSD License.

51	Table of Contents

53	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
54	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
55	     2.1.  Endpoint  . . . . . . . . . . . . . . . . . . . . . . . .   3
56	     2.2.  Derived Endpoint  . . . . . . . . . . . . . . . . . . . .   3
57	     2.3.  Path  . . . . . . . . . . . . . . . . . . . . . . . . . .   3
58	     2.4.  Connection  . . . . . . . . . . . . . . . . . . . . . . .   4
59	   3.  Connection Establishment Overview . . . . . . . . . . . . . .   4
60	   4.  Structuring Options as a Tree . . . . . . . . . . . . . . . .   5
61	     4.1.  Branch Types  . . . . . . . . . . . . . . . . . . . . . .   7
62	       4.1.1.  Derived Endpoints . . . . . . . . . . . . . . . . . .   7
63	       4.1.2.  Alternate Paths . . . . . . . . . . . . . . . . . . .   7
64	       4.1.3.  Protocol Options  . . . . . . . . . . . . . . . . . .   8
65	     4.2.  Branching Order-of-Operations . . . . . . . . . . . . . .   9
66	   5.  Connection Establishment Dynamics . . . . . . . . . . . . . .  10
67	     5.1.  Building the Tree . . . . . . . . . . . . . . . . . . . .  10
68	     5.2.  Racing Methods  . . . . . . . . . . . . . . . . . . . . .  11
69	       5.2.1.  Delayed Racing  . . . . . . . . . . . . . . . . . . .  11
70	       5.2.2.  Failover  . . . . . . . . . . . . . . . . . . . . . .  12
71	     5.3.  Completing Establishment  . . . . . . . . . . . . . . . .  12
72	       5.3.1.  Determining Successful Establishment  . . . . . . . .  13
73	   6.  API Considerations  . . . . . . . . . . . . . . . . . . . . .  14
74	     6.1.  Handling 0-RTT Data . . . . . . . . . . . . . . . . . . .  14
75	   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  15
76	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
77	   9.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  16
78	   10. Informative References  . . . . . . . . . . . . . . . . . . .  16
79	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  17

81	1.  Introduction

83	   Often, connections created across the Internet have multiple options
84	   of how to communicate: address families, specific IP addresses,
85	   network attachments, and application and transport protocols.  If an
86	   application chooses to only attempt one of these options, it may fail
87	   to connect, or end up using a suboptimal path.  If an application
88	   chooses to attempt one option after another, waiting for each to fail
89	   or time out, a user of the application may need to wait for a very
90	   long time before progress is made.  And, if an application
91	   simultaneously attempts all options, it may unnecessarily consume
92	   significant local or network resources.

94	   In order to solve this, applications can employ a method of racing
95	   their various connection establishment options.  This approach is
96	   commonly used for racing multiple IP address families, the algorithm
97	   for which is referred to as "Happy Eyeballs"
98	   [I-D.ietf-v6ops-rfc6555bis].  However, the approach can apply more
99	   generally.

101	   This document describes how an implementation can race multiple
102	   options during connection establishment, and expose this
103	   functionality through an API.

105	2.  Terminology

107	   This document uses specific terminology when discussing connection
108	   establishment.

110	2.1.  Endpoint

112	   An identifier for a network service.  Generally there is a concept of
113	   both a local and remote endpoint.  Endpoints are the targets of
114	   network connections.  If an endpoint of a given type cannot be
115	   directly used, it should be resolved into one or more endpoints of
116	   another type.  Examples of endpoint types include:

118	   o  IP address + port

120	   o  Hostname + port

122	   o  Service name + type + domain

124	   o  URI

126	2.2.  Derived Endpoint

128	   A derived endpoint is an endpoint that is not the original target of
129	   an API client, but an endpoint created from the original endpoint
130	   through transformation or lookup.  Derivation may take the form of
131	   hostname resolution into addresses, synthesis between address types,
132	   or changing to a different endpoint entirely based on a configuration
133	   requirement.  For example, if a proxy server must be used for a
134	   connection, the endpoint that represents the proxy is a derived
135	   endpoint.

137	2.3.  Path

139	   A view of network properties that can be used to communicate to an
140	   endpoint from the current system.  This is sometimes referred to as a
141	   Provisioning Domain (PvD) [RFC7556].  The path may include properties
142	   of the addresses and routes being used, the network interfaces being
143	   used, and other metadata about the network learned from configuration
144	   or negotiation.

146	2.4.  Connection

148	   A flow of data between two endpoints.  A connection is created with a
149	   target remote endpoint, and a set of parameters indicating client
150	   preferences for path selection and protocol options.

152	3.  Connection Establishment Overview

154	   The process of establishing a network connection begins when an
155	   application expresses intent to communicate with a remote endpoint
156	   (along with any constraints or requirements it may have on the
157	   connection).  The process can be considered complete once there is at
158	   least one set of network protocols that have completed any required
159	   setup to the point that it can transmit and receive the application's
160	   data.

162	   Looking more closely, connection establishment has three required
163	   steps that must be performed by some entity on a system:

165	   1.  Identifying the endpoint to which the connection should be
166	       established

168	   2.  Choosing which path or interface to use

170	   3.  Conducting the necessary set of protocol handshakes to establish
171	       the connection

173	   The most simple example of this process might involve identifying the
174	   single IP address to which the application wishes to connect, using
175	   the system's current default interface or path, and starting a TCP
176	   handshake to establish a stream to the specified IP address.
177	   However, each step may also vary depending on the requirements of the
178	   connection: if the endpoint is defined as a hostname and port, then
179	   there may be multiple resolved addresses that are available; there
180	   may also be multiple interfaces or paths available, other than the
181	   default system interface; and some protocols may not need any
182	   transport handshake to be considered "established" (such as UDP),
183	   while other connections may utilize layered protocol handshakes, such
184	   as TLS over TCP.

186	   Whenever an application has multiple options for connection
187	   establishment, it can view the set of all individual connection
188	   establishment options as a single, aggregate connection
189	   establishment.  The aggregate set conceptually includes every valid
190	   combination of endpoints, paths, and protocols.  As an example,
191	   consider an application that initiates a TCP connection to a hostname
192	   + port endpoint, and has two valid interfaces available (Wi-Fi and
193	   LTE).  The hostname resolves to a single IPv4 address on the Wi-Fi
194	   network, and resolves to the same IPv4 address on the LTE network, as
195	   well as a single IPv6 address.  The aggregate set of connection
196	   establishment options can be viewed as follows:

198	Aggregate [Endpoint: www.example.com:80] [Interface: Any]   [Protocol: TCP]
199	      |-> [Endpoint: 192.0.2.1:80]       [Interface: Wi-Fi] [Protocol: TCP]
200	      |-> [Endpoint: 192.0.2.1:80]       [Interface: LTE]   [Protocol: TCP]
201	      |-> [Endpoint: 2001:DB8::1.80]     [Interface: LTE]   [Protocol: TCP]

203	   Any one of these sub-entries on the aggregate connection attempt
204	   would satisfy the original application intent.  The concern of this
205	   document is the algorithm defining which of these options to try,
206	   when, and in what order.

208	4.  Structuring Options as a Tree

210	   When an implementation responsible for connection establishment needs
211	   to consider multiple options, it SHOULD logically structure these
212	   options as a hierarchical tree.  Each leaf node of the tree
213	   represents a single, coherent connection attempt, with an Endpoint, a
214	   Path, and a set of protocols that can directly negotiate and send
215	   data on the network.  Each node in the tree that is not a leaf
216	   represents a connection attempt that is either underspecified, or
217	   else includes multiple distinct options.  For example. when
218	   connecting on an IP network, a connection attempt to a hostname and
219	   port is underspecified, because the connection attempt requires a
220	   resolved IP address as its remote endpoint.  In this case, the node
221	   represented by the connection attempt to the hostname is a parent
222	   node, with child nodes for each IP address.  Similarly, an
223	   application that is allowed to connect using multiple interfaces will
224	   have a parent node of the tree for the decision between the paths,
225	   with a branch for each interface.

227	   The example aggregate connection attempt above can be drawn as a tree
228	   by grouping the addresses resolved on the same interface into
229	   branches:

231	                             ||
232	                +==========================+
233	                |  www.example.com:80/Any  |
234	                +==========================+
235	                  //                    \\
236	+==========================+       +==========================+
237	| www.example.com:80/Wi-Fi |       |  www.example.com:80/LTE  |
238	+==========================+       +==========================+
239	             ||                      //                    \\
240	  +====================+  +====================+  +======================+
241	  | 192.0.2.1:80/Wi-Fi |  |  192.0.2.1:80/LTE  |  |  2001:DB8::1.80/LTE  |
242	  +====================+  +====================+  +======================+

244	   The rest of this document will use a notation scheme to represent
245	   this tree.  The parent (or trunk) node of the tree will be
246	   represented by a single integer, such as "1".  Each child of that
247	   node will have an integer that identifies it, from 1 to the number of
248	   children.  That child node will be uniquely identified by
249	   concatenating its integer to it's parents identifier with a dot in
250	   between, such as "1.1" and "1.2".  Each node will be summarized by a
251	   tuple of three elements: Endpoint, Path, and Protocol.  The above
252	   example can now be written more succinctly as:

254	   1 [www.example.com:80, Any, TCP]
255	     1.1 [www.example.com:80, Wi-Fi, TCP]
256	       1.1.1 [192.0.2.1:80, Wi-Fi, TCP]
257	     1.2 [www.example.com:80, LTE, TCP]
258	       1.2.1 [192.0.2.1:80, LTE, TCP]
259	       1.2.2 [2001:DB8::1.80, LTE, TCP]

261	   When an application views this aggregate set of connection attempts
262	   as a single connection establishment, it only will use one of the
263	   leaf nodes to transfer data.  Thus, when a single leaf node becomes
264	   ready to use, then the entire connection attempt is ready to use by
265	   the application.  Another way to represent this is that every leaf
266	   node updates the state of its parent node when it becomes ready,
267	   until the trunk node of the tree is ready, which then notifies the
268	   application that the connection as a whole is ready to use.

270	   A connection establishment tree may be degenerate, and only have a
271	   single leaf node, such as a connection attempt to an IP address over
272	   a single interface with a single protocol.

274	   1 [192.0.2.1:80, Wi-Fi, TCP]

276	   A parent node may also only have one child (or leaf) node, such as a
277	   when a hostname resolves to only a single IP address.

279	   1 [www.example.com:80, Wi-Fi, TCP]
280	     1.1 [192.0.2.1:80, Wi-Fi, TCP]

282	4.1.  Branch Types

284	   There are three types of branching from a parent node into one or
285	   more child nodes.  Any parent node of the tree MUST only use one type
286	   of branching.

288	4.1.1.  Derived Endpoints

290	   If a connection originally targets a single endpoint, there may be
291	   multiple endpoints of different types that can be derived from the
292	   original.  The connection library should order the derived endpoints
293	   according to application preference and expected performance.

295	   DNS hostname-to-address resolution is the most common method of
296	   endpoint derivation.  When trying to connect to a hostname endpoint
297	   on a traditional IP network, the implementation SHOULD send DNS
298	   queries for both A (IPv4) and AAAA (IPv6) records if both are
299	   supported on the local link.  The algorithm for ordering and racing
300	   these addresses SHOULD follow the recommendations in Happy Eyeballs
301	   [I-D.ietf-v6ops-rfc6555bis].

303	   1 [www.example.com:80, Wi-Fi, TCP]
304	     1.1 [2001:DB8::1.80, Wi-Fi, TCP]
305	     1.2 [192.0.2.1:80, Wi-Fi, TCP]
306	     1.3 [2001:DB8::2.80, Wi-Fi, TCP]
307	     1.4 [2001:DB8::3.80, Wi-Fi, TCP]

309	   DNS-Based Service Discovery can also provide an endpoint derivation
310	   step.  When trying to connect to a named service, the client may
311	   discover one or more hostname and port pairs on the local network
312	   using multicast DNS.  These hostnames should each be treated as a
313	   branch which can be attempted independently from other hostnames.
314	   Each of these hostnames may also resolve to one or more addresses,
315	   thus creating multiple layers of branching.

317	   1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP]
318	     1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP]
319	       1.1.1 [31.133.160.18.631, Wi-Fi, TCP]

321	4.1.2.  Alternate Paths

323	   If a client has multiple network interfaces available to it, such as
324	   mobile client with both Wi-Fi and Cellular connectivity, it can
325	   attempt a connection over either interface.  This represents a branch
326	   point in the connection establishment.  Like with derived endpoints,
327	   the interfaces should be ranked based on preference, system policy,
328	   and performance.  Attempts should be started on one interface, and
329	   then on other interfaces successively after delays based on expected
330	   round-trip-time or other available metrics.

332	   1 [192.0.2.1:80, Any, TCP]
333	     1.1 [192.0.2.1:80, Wi-Fi, TCP]
334	     1.2 [192.0.2.1:80, LTE, TCP]

336	   This same approach applies to any situation in which the client is
337	   aware of multiple links or views of the network.  Multiple Paths,
338	   each with a coherent set of addresses, routes, DNS server, and more,
339	   may share a single interface.  A path may also represent a virtual
340	   interface service such as a Virtual Private Network (VPN).

342	   The list of available paths should be constrained by any requirements
343	   or prohibitions the application sets, as well as system policy.

345	4.1.3.  Protocol Options

347	   Differences in possible protocol compositions and options can also
348	   provide a branching point in connection establishment.  This allows
349	   clients to be resilient to situations in which a certain protocol is
350	   not functioning on a server or network.

352	   This approach is commonly used for connections with optional proxy
353	   server configurations.  A single connection may be allowed to use an
354	   HTTP-based proxy, a SOCKS-based proxy, or connect directly.  These
355	   options should be ranked and attempted in succession.

357	   1 [www.example.com:80, Any, HTTP/TCP]
358	     1.1 [192.0.2.8:80, Any, HTTP/HTTP Proxy/TCP]
359	     1.2 [192.0.2.7:10234, Any, HTTP/SOCKS/TCP]
360	     1.3 [www.example.com:80, Any, HTTP/TCP]
361	       1.3.1 [192.0.2.1:80, Any, HTTP/TCP]

363	   This approach also allows a client to attempt different sets of
364	   application and transport protocols that may provide preferable
365	   characteristics when available.  For example, the protocol options
366	   could involve QUIC [I-D.ietf-quic-transport] over UDP on one branch,
367	   and HTTP/2 [RFC7540] over TLS over TCP on the other:

369	   1 [www.example.com:443, Any, Any HTTP]
370	     1.1 [www.example.com:443, Any, QUIC/UDP]
371	       1.1.1 [192.0.2.1:443, Any, QUIC/UDP]
372	     1.2 [www.example.com:443, Any, HTTP2/TLS/TCP]
373	       1.2.1 [192.0.2.1:443, Any, HTTP2/TLS/TCP]

375	   Another example is racing SCTP with TCP:

377	   1 [www.example.com:80, Any, Any Stream]
378	     1.1 [www.example.com:80, Any, SCTP]
379	            1.1.1 [192.0.2.1:80, Any, SCTP]
380	     1.2 [www.example.com:80, Any, TCP]
381	       1.2.1 [192.0.2.1:80, Any, TCP]

383	   Implementations that support racing protocols and protocol options
384	   SHOULD maintain a history of which protocols and protocol options
385	   successfully established, on a per-network basis.  This information
386	   can influence future racing decisions to prioritize or prune
387	   branches.

389	4.2.  Branching Order-of-Operations

391	   Branch types must occur in a specific order relative to one another
392	   to avoid creating leaf nodes with invalid or incompatible settings.
393	   In the example above, it would be invalid to branch for derived
394	   endpoints (the DNS results for www.example.com) before branching
395	   between interface paths, since usable DNS results on one network may
396	   not necessarily be the same as DNS results on another network due to
397	   local network entities, supported address families, or enterprise
398	   network configurations.  Implementations must be careful to branch in
399	   an order that results in usable leaf nodes whenever there are
400	   multiple branch types that could be used from a single node.

402	   The order of operations for branching, where lower numbers are acted
403	   upon first, SHOULD be:

405	   1.  Alternate Paths

407	   2.  Protocol Options

409	   3.  Derived Endpoints

411	   Branching between paths is the first in the list because results
412	   across multiple interfaces are likely not related to one another:
413	   endpoint resolution may return different results, especially when
414	   using locally resolved host and service names, and which protocols
415	   are supported and preferred may differ across interfaces.  Thus, if
416	   multiple paths are attempted, the overall connection can be seen as a
417	   race between the available paths or interfaces.

419	   Protocol options are checked next in order.  Whether or not a set of
420	   protocol, or protocol-specific options, can successfully connect is
421	   generally not dependent on which specific IP address is used.
422	   Furthermore, the protocol stacks being attempted may influence or
423	   altogether change the endpoints being used.  Adding a proxy to a
424	   connection's branch will change the endpoint to the proxy's IP
425	   address or hostname.  Choosing an alternate protocol may also modify
426	   the ports that should be selected.

428	   Branching for derived endpoints is the final step, and may have
429	   multiple layers of derivation or resolution, such as DNS service
430	   resolution and DNS hostname resolution.

432	5.  Connection Establishment Dynamics

434	   The primary goal of the connection establishment process is to
435	   successfully negotiate a protocol stack to an endpoint over an
436	   interface--to connect a single leaf node of the tree--with as little
437	   delay and as few unnecessary connections attempts as possible.
438	   Optimizing these two factors improves the user experience, while
439	   minimizing network load.

441	   This section covers the dynamic aspect of connection establishment.
442	   While the tree described above is a useful conceptual and
443	   architectural model, an implementation does not know what the full
444	   tree may become up front, nor will many of the possible branches be
445	   used in the common case.

447	5.1.  Building the Tree

449	   The tree of options is built dynamically, out from the original trunk
450	   node.  Any time that a connection attempt may be made directly to an
451	   endpoint without further derivation, and without needing to try
452	   alternate paths or protocol options that have not yet been covered by
453	   previous branches, the implementation SHOULD treat this as a leaf
454	   node and connect directly.  Any time that an implementation chooses
455	   to branch between multiple options, it SHOULD determine a preferred
456	   order between the child nodes based on system policy, expected or
457	   historical performance, and application preference.

459	   When multiple paths are available, and permitted by the system's
460	   policy, the implementation SHOULD branch between the various paths.
461	   The list SHOULD be sorted based on the system policies and routes
462	   (which often determine a "default" interface), preferences expressed
463	   by the application, and expected performance based on measured or
464	   advertised properties of each path.

466	   When multiple protocol options are allowed by an application, and the
467	   system and implementation identify valid sets of protocols and
468	   protocol options, the implementation SHOULD branch between these
469	   sets.  This list SHOULD be sorted based on application preference and
470	   expected performance, generally measured in terms of latency and
471	   bandwidth.

473	   An implementation will only branch to derive endpoints when
474	   necessary.  This step involves the most external information, as
475	   endpoint derivation is often a process that requires fetching
476	   information from the network.  Before branching, an implementation
477	   must first generate the list of derived endpoints.  Once this list is
478	   sufficiently populated to continue, the implementation SHOULD sort
479	   the list based on preference and expected performance.  When these
480	   derived endpoints are IP addresses, implementations SHOULD use the
481	   algorithm in [RFC6724] to sort the addresses.  In cases where
482	   additional information can become available after the initial tree
483	   has been constructed, the implementation SHOULD update the tree to
484	   reflect new information and orderings if none of the leaf nodes are
485	   fully established.

487	5.2.  Racing Methods

489	   There are three different approaches to racing the attempts for
490	   different nodes of the connection establishment tree:

492	   1.  Immediate

494	   2.  Delayed

496	   3.  Failover

498	   Each approach is appropriate in different use-cases and branch types.
499	   However, to avoid consuming unnecessary network resources,
500	   implementations SHOULD NOT use immediate racing as a default
501	   approach.

503	   The timing algorithms for racing SHOULD remain independent across
504	   branches of the tree.  Any timers or racing logic is isolated to a
505	   given parent node, and is not ordered precisely with regards to other
506	   children of other nodes.

508	5.2.1.  Delayed Racing

510	   Delayed racing can be used whenever a single node of the tree has
511	   multiple child nodes.  Based on the order determined when building
512	   the tree, the first child node will be initiated immediately,
513	   followed by the next child node after some delay.  Once that second
514	   child node is initiated, the third child node (if present) will begin
515	   after another delay, and so on until all child nodes have been
516	   initiated, or one of the child nodes successfully completes its
517	   negotiation.

519	   Delayed racing attempts occur in parallel.  Implementations SHOULD
520	   NOT terminate an earlier child connection attempt upon starting a
521	   secondary child.

523	   The delay between starting child nodes SHOULD be based on the
524	   properties of the previously started child node.  For example, if the
525	   first child represents an IP address with a known route, and the
526	   second child represents another IP address, the delay between
527	   starting the first and second IP addresses can be based on the
528	   expected retransmission cadence for the first child's connection
529	   (derived from historical round-trip-time).  Alternatively, if the
530	   first child represents a branch on a Wi-Fi interface, and the second
531	   child represents a branch on an LTE interface, the delay should be
532	   based on the expected time in which the branch for the first
533	   interface would be able to establish a connection, based on link
534	   quality and historical round-trip-time.

536	   Any delay SHOULD have a defined minimum and maximum value based on
537	   the branch type.  Generally, branches between paths and protocols
538	   should have longer delays than branches between derived endpoints.
539	   The maximum delay should be considered with regards to how long a
540	   user is expected to wait for the connection to complete.

542	   If a child node fails to connect before the delay timer has fired for
543	   the next child, the next child SHOULD be started immediately.

545	5.2.2.  Failover

547	   If an implementation or application has a strong preference for one
548	   branch over another, the branching node may choose to wait until one
549	   child has failed before starting the next.  Failure of a leaf node is
550	   determined by its protocol negotiation failing or timing out; failure
551	   of a parent branching node is determined by all of its children
552	   failing.

554	   An example in which failover is recommended is a race between a
555	   protocol stack that uses a proxy and a protocol stack that bypasses
556	   the proxy.  Failover is useful in case the proxy is down or
557	   misconfigured, but any more aggressive type of racing may end up
558	   unnecessarily avoiding a proxy that was preferred by policy.

560	5.3.  Completing Establishment

562	   The process of connection establishment completes when one leaf node
563	   of the tree has completed negotiation with the remote endpoint
564	   successfully, or else all nodes of the tree have failed to connect.
565	   The first leaf node to complete its connection is then used by the
566	   application to send and receive data.

568	   It is useful to process success and failure throughout the tree by
569	   child nodes reporting to their parent nodes (towards the trunk of the
570	   tree).  For example, in the following case, if 1.1.1 fails to
571	   connect, it reports the failure to 1.1.  Since 1.1 has no other child
572	   nodes, it also has failed and reports that failure to 1.  Because 1.2
573	   has not yet failed, 1 is not considered to have failed.  Since 1.2
574	   has not yet started, it is started and the process continues.
575	   Similarly, if 1.1.1 successfully connects, then it marks 1.1 as
576	   connected, which propagates to the trunk node 1.  At this point, the
577	   connection as a whole is considered to be successfully connected and
578	   ready to process application data

580	   1 [www.example.com:80, Any, TCP]
581	     1.1 [www.example.com:80, Wi-Fi, TCP]
582	       1.1.1 [192.0.2.1:80, Wi-Fi, TCP]
583	     1.2 [www.example.com:80, LTE, TCP]
584	       ...

586	   If a leaf node has successfully completed its connection, all other
587	   attempts SHOULD be made ineligible for use by the application for the
588	   original request.  New connection attempts that involve transmitting
589	   data on the network SHOULD NOT be started after another leaf node has
590	   completed successfully, as the connection as a whole has been
591	   established.  An implementation MAY choose to let certain handshakes
592	   and negotiations complete in order to gather metrics to influence
593	   future connections.  Similarly, an implementation MAY choose to hold
594	   onto fully established leaf nodes that were not the first to
595	   establish for use in future connections, but this approach is not
596	   recommended since those attempts were slower to connect and may
597	   exhibit less desirable properties.

599	5.3.1.  Determining Successful Establishment

601	   Implementations may select the criteria by which a leaf node is
602	   considered to be successfully connected differently on a per-protocol
603	   basis.  If the only protocol being used is a transport protocol with
604	   a clear handshake, like TCP, then the obvious choice is to declare
605	   that node "connected" when the last packet of the three-way handshake
606	   has been received.  If the only protocol being used is an
607	   "unconnected" protocol, like UDP, the implementation may consider the
608	   node fully "connected" the moment it determines a route is present,
609	   before sending any packets on the network.

611	   For protocol stacks with multiple handshakes, the decision becomes
612	   more nuanced.  If the protocol stack involves both TLS and TCP, an
613	   implementation MAY determine that a leaf node is connected after the
614	   TCP handshake is complete, or it MAY wait for the TLS handshake to
615	   complete as well.  The benefit of declaring completion when the TCP
616	   handshake finishes, and thus stopping the race for other branches of
617	   the tree, is that there will be less burden on the network from other
618	   connection attempts.  On the other hand, by waiting until the TLS
619	   handshake is complete, an implementation avoids the scenario in which
620	   a TCP handshake completes quickly, but TLS negotiation is either very
621	   slow or fails altogether in particular network conditions or to a
622	   particular endpoint.

624	6.  API Considerations

626	   In general, the internal states and nodes of racing connection
627	   establishment do not need to be exposed to applications.  Instead,
628	   this process SHOULD be treated as an abstraction of a single,
629	   aggregate connection establishment behind an API.  This places some
630	   requirements on the API, including:

632	   o  The API must allow the application to specify an un-resolved
633	      endpoint as the remote side of the connection, such as a URI or
634	      hostname + port.  The application also should be able to provide
635	      constraints on path selection and protocol features.

637	   o  Any read or write operations cannot take effect until one leaf
638	      node has been chosen as the connected node.  The API needs to
639	      either expose asynchronous reads and writes, or else prohibit
640	      reads and writes until the connection is established.

642	   o  The action of starting or initiating the connection may involve
643	      many network-bound operations, so this operation SHOULD be
644	      asynchronous.

646	   o  Properties of the connection, such as the remote and local
647	      addresses, the interface used, and the protocols used, may not be
648	      queryable until the connection is established.

650	6.1.  Handling 0-RTT Data

652	   Several protocols allow sending higher-level protocol or application
653	   data within the first packet of their protocol establishment, such as
654	   TCP Fast Open [RFC7413] and TLS 1.3 [I-D.ietf-tls-tls13].  This
655	   approach is referred to as sending Zero-RTT (0-RTT) data.  This is a
656	   desirable property, but poses challenges to an implementation that
657	   uses racing during connection establishment.

659	   If the application has 0-RTT data to send in any protocol handshakes,
660	   it needs to provide this data before the handshakes have begun.  When
661	   racing, this means that the data SHOULD be provided before the
662	   process of connection establishment has begun.  If the API allows the
663	   application to send 0-RTT data, it MUST provide an interface that
664	   identifies this data as idempotent data.  In general, 0-RTT data may
665	   be replayed (for example, if a TCP SYN contains data, and the SYN is
666	   retransmitted, the data will be retransmitted as well), but racing
667	   means that different leaf nodes have the opportunity to send the same
668	   data independently.  If data is truly idempotent, this should be
669	   permissible.

671	   Once the application has provided its 0-RTT data, an implementation
672	   SHOULD keep a copy of this data and provide it to each new leaf node
673	   that is started and for which a 0-RTT protocol is being used.

675	   It is also possible that protocol stacks within a particular leaf
676	   node use 0-RTT handshakes without any idempotent application data.
677	   For example, TCP Fast Open could use a Client Hello from a TLS as its
678	   0-RTT data, shortening the cumulative handshake time.

680	   0-RTT handshakes often rely on previous state, such as TCP Fast Open
681	   cookies, previously established TLS tickets, or out-of-band
682	   distributed pre-shared keys (PSKs).  Implementations should be aware
683	   of security concerns around using these tokens across multiple
684	   addresses or paths when racing.  In the case of TLS, any given ticket
685	   or PSK SHOULD only be used on one leaf node.  If implementations have
686	   multiple tickets available from a previous connection, each leaf node
687	   attempt MUST use a different ticket.  In effect, each leaf node will
688	   send the same early application data, yet encoded (encrypted)
689	   differently on the wire.

691	7.  Security Considerations

693	   See Section 6.1 for security considerations around racing with 0-RTT
694	   data.

696	   An attacker that knows a particular device is racing several options
697	   during connection establishment may be able to block packets for the
698	   first connection attempt, thus inducing the device to fall back to a
699	   secondary attempt.  This is a problem if the secondary attempts have
700	   worse security properties that enable further attacks.
701	   Implementations should ensure that all options have equivalent
702	   security properties to avoid incentivizing attacks.

704	   Since results from the network can determine how a connection attempt
705	   tree is built, such as when DNS returns a list of resolved endpoints,
706	   it is possible for the network to cause an implementation to consume
707	   significant on-device resources.  Implementations SHOULD limit the
708	   maximum amount of state allowed for any given node, including the
709	   number of child nodes, especially when the state is based on results
710	   from the network.

712	8.  IANA Considerations

714	   This document has no request to IANA.

716	9.  Acknowledgments

718	   Thanks to Josh Graessley and Stuart Cheshire for their help in the
719	   design of the original implementation of Happy Eyeballs for Apple
720	   that began this work.

722	10.  Informative References

724	   [I-D.ietf-quic-transport]
725	              Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
726	              and Secure Transport", draft-ietf-quic-transport-07 (work
727	              in progress), October 2017.

729	   [I-D.ietf-tls-tls13]
730	              Rescorla, E., "The Transport Layer Security (TLS) Protocol
731	              Version 1.3", draft-ietf-tls-tls13-21 (work in progress),
732	              July 2017.

734	   [I-D.ietf-v6ops-rfc6555bis]
735	              Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2:
736	              Better Connectivity Using Concurrency", draft-ietf-v6ops-
737	              rfc6555bis-06 (work in progress), October 2017.

739	   [RFC6724]  Thaler, D., Ed., Draves, R., Matsumoto, A., and T. Chown,
740	              "Default Address Selection for Internet Protocol Version 6
741	              (IPv6)", RFC 6724, DOI 10.17487/RFC6724, September 2012,
742	              <https://www.rfc-editor.org/info/rfc6724>.

744	   [RFC7413]  Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
745	              Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
746	              <https://www.rfc-editor.org/info/rfc7413>.

748	   [RFC7540]  Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
749	              Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
750	              DOI 10.17487/RFC7540, May 2015, <https://www.rfc-
751	              editor.org/info/rfc7540>.

753	   [RFC7556]  Anipko, D., Ed., "Multiple Provisioning Domain
754	              Architecture", RFC 7556, DOI 10.17487/RFC7556, June 2015,
755	              <https://www.rfc-editor.org/info/rfc7556>.

757	Author's Address

759	   Tommy Pauly
760	   Apple Inc.
761	   1 Infinite Loop
762	   Cupertino, California 95014
763	   United States of America

765	   Email: tpauly@apple.com