idnits 2.17.1 

draft-brunstrom-taps-impl-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 4 instances of too long lines in the document, the longest one
     being 3 characters in excess of 72.

  ** There are 3 instances of lines with control characters in the document.

  ** The abstract seems to contain references ([I-D.pauly-taps-arch]), which
     it shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 05, 2018) is 2244 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-11) exists of
     draft-ietf-taps-minset-02

  ** Obsolete normative reference: RFC 7540 (Obsoleted by RFC 9113)

  == Outdated reference: A later version (-34) exists of
     draft-ietf-quic-transport-10

  == Outdated reference: A later version (-28) exists of
     draft-ietf-tls-tls13-26

  -- Obsolete informational reference (is this intentional?): RFC 5245
     (Obsoleted by RFC 8445, RFC 8839)


     Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TAPS Working Group                                     A. Brunstrom, Ed.
3	Internet-Draft                                       Karlstad University
4	Intended status: Informational                             T. Pauly, Ed.
5	Expires: September 6, 2018                                    Apple Inc.
6	                                                             T. Enghardt
7	                                                               TU Berlin
8	                                                           K-J. Grinnemo
9	                                                     Karlstad University
10	                                                                T. Jones
11	                                                  University of Aberdeen
12	                                                               P. Tiesel
13	                                                               TU Berlin
14	                                                              C. Perkins
15	                                                   University of Glasgow
16	                                                                M. Welzl
17	                                                      University of Oslo
18	                                                          March 05, 2018

20	             Implementing Interfaces to Transport Services
21	                    draft-brunstrom-taps-impl-00

23	Abstract

25	   The Transport Services architecture [I-D.pauly-taps-arch] defines a
26	   system that allows applications to use transport networking protocols
27	   flexibly.  This document serves as a guide to implementation on how
28	   to build such a system.

30	Status of This Memo

32	   This Internet-Draft is submitted in full conformance with the
33	   provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF).  Note that other groups may also distribute
37	   working documents as Internet-Drafts.  The list of current Internet-
38	   Drafts is at https://datatracker.ietf.org/drafts/current/.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress."

45	   This Internet-Draft will expire on September 6, 2018.

47	Copyright Notice

49	   Copyright (c) 2018 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (https://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
65	   2.  Implementing Basic Objects  . . . . . . . . . . . . . . . . .   3
66	   3.  Implementing Pre-Establishment  . . . . . . . . . . . . . . .   4
67	     3.1.  Configuration-time errors . . . . . . . . . . . . . . . .   4
68	     3.2.  Role of system policy . . . . . . . . . . . . . . . . . .   5
69	   4.  Implementing Connection Establishment . . . . . . . . . . . .   6
70	     4.1.  Candidate Gathering . . . . . . . . . . . . . . . . . . .   7
71	       4.1.1.  Structuring Options as a Tree . . . . . . . . . . . .   7
72	       4.1.2.  Branch Types  . . . . . . . . . . . . . . . . . . . .   9
73	     4.2.  Branching Order-of-Operations . . . . . . . . . . . . . .  11
74	     4.3.  Sorting Branches  . . . . . . . . . . . . . . . . . . . .  12
75	     4.4.  Candidate Racing  . . . . . . . . . . . . . . . . . . . .  13
76	       4.4.1.  Delayed Racing  . . . . . . . . . . . . . . . . . . .  13
77	       4.4.2.  Failover  . . . . . . . . . . . . . . . . . . . . . .  14
78	     4.5.  Completing Establishment  . . . . . . . . . . . . . . . .  15
79	       4.5.1.  Determining Successful Establishment  . . . . . . . .  15
80	     4.6.  Establishing multiplexed connections  . . . . . . . . . .  16
81	     4.7.  Handling racing with "unconnected" protocols  . . . . . .  17
82	     4.8.  Implementing listeners  . . . . . . . . . . . . . . . . .  17
83	       4.8.1.  Implementing listeners for Connected Protocols  . . .  18
84	       4.8.2.  Implementing listeners for Unconnected Protocols  . .  18
85	       4.8.3.  Implementing listeners for Multiplexed Protocols  . .  18
86	   5.  Implementing Data Transfer  . . . . . . . . . . . . . . . . .  18
87	     5.1.  Data transfer for streams, datagrams, and frames  . . . .  18
88	       5.1.1.  Sending Messages  . . . . . . . . . . . . . . . . . .  19
89	       5.1.2.  Receiving Messages  . . . . . . . . . . . . . . . . .  20
90	     5.2.  Handling of data for fast-open protocols  . . . . . . . .  21
91	   6.  Implementing Maintenance  . . . . . . . . . . . . . . . . . .  22
92	     6.1.  Changing Protocol Properties  . . . . . . . . . . . . . .  22
93	     6.2.  Handling Path Changes . . . . . . . . . . . . . . . . . .  23
94	   7.  Implementing Termination  . . . . . . . . . . . . . . . . . .  23
95	   8.  Cached State  . . . . . . . . . . . . . . . . . . . . . . . .  24
96	     8.1.  Protocol state caches . . . . . . . . . . . . . . . . . .  24
97	     8.2.  Performance caches  . . . . . . . . . . . . . . . . . . .  25
98	   9.  Specific Transport Protocol Considerations  . . . . . . . . .  26
99	     9.1.  TCP . . . . . . . . . . . . . . . . . . . . . . . . . . .  26
100	     9.2.  UDP . . . . . . . . . . . . . . . . . . . . . . . . . . .  27
101	     9.3.  SCTP  . . . . . . . . . . . . . . . . . . . . . . . . . .  27
102	     9.4.  TLS . . . . . . . . . . . . . . . . . . . . . . . . . . .  28
103	     9.5.  HTTP  . . . . . . . . . . . . . . . . . . . . . . . . . .  28
104	     9.6.  QUIC  . . . . . . . . . . . . . . . . . . . . . . . . . .  28
105	     9.7.  HTTP/2 transport  . . . . . . . . . . . . . . . . . . . .  29
106	   10. Rendezvous and Environment Discovery  . . . . . . . . . . . .  29
107	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  31
108	   12. Security Considerations . . . . . . . . . . . . . . . . . . .  31
109	     12.1.  Considerations for Candidate Gathering . . . . . . . . .  31
110	     12.2.  Considerations for Candidate Racing  . . . . . . . . . .  31
111	   13. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  32
112	   14. References  . . . . . . . . . . . . . . . . . . . . . . . . .  32
113	     14.1.  Normative References . . . . . . . . . . . . . . . . . .  32
114	     14.2.  Informative References . . . . . . . . . . . . . . . . .  33
115	   Appendix A.  Additional Properties  . . . . . . . . . . . . . . .  34
116	     A.1.  Properties Affecting Sorting of Branches  . . . . . . . .  34
117	     A.2.  Send Parameters . . . . . . . . . . . . . . . . . . . . .  35
118	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  35

120	1.  Introduction

122	   The Transport Services architecture [I-D.pauly-taps-arch] defines a
123	   system that allows applications to use transport networking protocols
124	   flexibly.  The interface such a system exposes to applications is
125	   defined as the Transport Services API [I-D.trammell-taps-interface].
126	   This API is designed to be generic across multiple transport
127	   protocols and sets of protocols features.

129	   This document serves as a guide to implementation on how to build a
130	   system that provides a Transport Services API.  It is the job of an
131	   implementation of a Transport Services system to turn the requests of
132	   an application into decisions on how to establish connections, and
133	   how to transfer data over those connections once established.  The
134	   terminology used in this document is based on the Architecture
135	   [I-D.pauly-taps-arch].

137	2.  Implementing Basic Objects

139	   The basic objects that are exposed to applications for Transport
140	   Services are the Preconnection, the bundle of properties that
141	   describes the application constraints on the transport; the
142	   Connection, the basic object that represents a flow of data in either
143	   direction between the Local and Remote Endpoints; and the Listener, a
144	   passive waiting object that delivers new Connections.

146	   Preconnection objects should be implemented as bundles of properties
147	   that an application can both read and write.  Once a Preconnection
148	   has been used to create an outbound Connection or a Listener, the
149	   implementation should ensure that the copy of the properties held by
150	   the Connection or Listener is immutable.  This may involve performing
151	   a deep-copy if the application is still able to modify properties on
152	   the original Preconnection object.

154	   Connection objects represent the interface between the application
155	   and the implementation to manage transport state, and conduct data
156	   transfer.  During the process of establishment (Section 4), the
157	   Connection will be unbound to a specific transport flow, since there
158	   may be multiple candidate Protocol Stacks being raced.  Once the
159	   Connection is established, the object should be considered mapped to
160	   a specific Protocol Stack.  The notion of a Connection maps to many
161	   different protocols, depending on the Protocol Stack.  For example,
162	   the Connection may ultimately represent the interface into a TCP
163	   connection, a TLS session over TCP, a UDP flow with fully-specified
164	   local and remote endpoints, a DTLS session, a SCTP stream, a QUIC
165	   stream, or an HTTP/2 stream.

167	   Listener objects are created with a Preconnection, at which point
168	   their configuration should be considered immutable by the
169	   implementation.  The process of listening is described in
170	   Section 4.8.

172	3.  Implementing Pre-Establishment

174	   During pre-establishment the application specifies the Endpoints to
175	   be used for communication as well as its preferences regarding
176	   Protocol and Path Selection.  The implementation stores these objects
177	   and properties as part of the Preconnection object for use during
178	   connection establishment.  For Protocol and Path Selection Properties
179	   that are not provided by the application, the implementation must use
180	   the default values specified in the Transport Services API
181	   ([I-D.trammell-taps-interface]).

183	3.1.  Configuration-time errors

185	   The transport system should have a list of supported protocols
186	   available, which each have transport features reflecting the
187	   capabilities of the protocol.  Once an application specifies its
188	   Transport Parameters, the transport system should match the required
189	   and prohibited properties against the transport features of the
190	   available protocols.

192	   In the following cases, failure should be detected during pre-
193	   establishment:

195	   o  The application requested Protocol Properties that include
196	      requirements or prohibitions that cannot be satisfied by any of
197	      the available protocols.  For example, if an application requires
198	      "Configure Reliability per Message", but no such protocol is
199	      available on the host running the transport system, e.g., because
200	      SCTP is not supported by the operating system, this should result
201	      in an error.

203	   o  The application requested Protocol Properties that are in conflict
204	      with each other, i.e., the required and prohibited properties
205	      cannot be satisfied by the same protocol.  For example, if an
206	      application prohibits "Reliable Data Transfer" but then requires
207	      "Configure Reliability per Message", this mismatch should result
208	      in an error.

210	   It is important to fail as early as possible in such cases in order
211	   to avoid allocating resources, e.g., to endpoint resolution, only to
212	   find out later that there is no protocol that satisfies the
213	   requirements.

215	3.2.  Role of system policy

217	   The properties specified during pre-establishment has a close
218	   connection to system policy.  The implementation is responsible for
219	   combining and reconciling several different sources of preferences
220	   when establishing Connections.  These include, but are not limited
221	   to:

223	   1.  Application preferences, i.e., preferences specified during the
224	       pre-establishment such as Local Endpoint, Remote Endpoint, Path
225	       Selection Properties, and Protocol Selection Properties.

227	   2.  Dynamic system policy, i.e., policy compiled from internally and
228	       externally acquired information about available network
229	       interfaces, supported transport protocols, and current/previous
230	       Connections.  Examples of ways to externally retrieve policy-
231	       support information are through OS-specific statistics/
232	       measurement tools and tools that reside on middleboxes and
233	       routers.

235	   3.  Default implementation policy, i.e., predefined policy by OS or
236	       application.

238	   In general, any protocol or path used for a connection must conform
239	   to all three sources of constraints.  Any violation of any of the
240	   layers should cause a protocol or path to be considered ineligible
241	   for use.  For an example of application preferences leading to
242	   constraints, an application may prohibit the use of metered network
243	   interfaces for a given Connection to avoid user cost.  Similarly, the
244	   system policy at a given time may prohibit the use of such a metered
245	   network interface from the application's process.  Lastly, the
246	   implementation itself may default to disallowing certain network
247	   interfaces unless explicitly requested by the application and allowed
248	   by the system.

250	   It is expected that the database of system policies and the method of
251	   looking up these policies will vary across various platforms.  An
252	   implementation should attempt to look up the relevant policies for
253	   the system in a dynamic way to make sure it is reflecting an accurate
254	   version of the system policy, since the system's policy regarding the
255	   application's traffic may change over time due to user or
256	   administrative changes.

258	4.  Implementing Connection Establishment

260	   The process of establishing a network connection begins when an
261	   application expresses intent to communicate with a remote endpoint by
262	   calling Initiate.  (At this point, any constraints or requirements
263	   the application may have on the connection are available from pre-
264	   establishment.)  The process can be considered complete once there is
265	   at least one Protocol Stack that has completed any required setup to
266	   the point that it can transmit and receive the application's data.

268	   Connection establishment is divided into two top-level steps:
269	   Candidate Gathering, to identify the paths, protocols, and endpoints
270	   to use, and Candidate Racing, in which the necessary protocol
271	   handshakes are conducted in order to select which set to use.

273	   The most simple example of this process might involve identifying the
274	   single IP address to which the implementation wishes to connect,
275	   using the system's current default interface or path, and starting a
276	   TCP handshake to establish a stream to the specified IP address.
277	   However, each step may also vary depending on the requirements of the
278	   connection: if the endpoint is defined as a hostname and port, then
279	   there may be multiple resolved addresses that are available; there
280	   may also be multiple interfaces or paths available, other than the
281	   default system interface; and some protocols may not need any
282	   transport handshake to be considered "established" (such as UDP),
283	   while other connections may utilize layered protocol handshakes, such
284	   as TLS over TCP.

286	   Whenever an implementation has multiple options for connection
287	   establishment, it can view the set of all individual connection
288	   establishment options as a single, aggregate connection
289	   establishment.  The aggregate set conceptually includes every valid
290	   combination of endpoints, paths, and protocols.  As an example,
291	   consider an implementation that initiates a TCP connection to a
292	   hostname + port endpoint, and has two valid interfaces available (Wi-
293	   Fi and LTE).  The hostname resolves to a single IPv4 address on the
294	   Wi-Fi network, and resolves to the same IPv4 address on the LTE
295	   network, as well as a single IPv6 address.  The aggregate set of
296	   connection establishment options can be viewed as follows:

298	Aggregate [Endpoint: www.example.com:80] [Interface: Any]   [Protocol: TCP]
299	|-> [Endpoint: 192.0.2.1:80]       [Interface: Wi-Fi] [Protocol: TCP]
300	|-> [Endpoint: 192.0.2.1:80]       [Interface: LTE]   [Protocol: TCP]
301	|-> [Endpoint: 2001:DB8::1.80]     [Interface: LTE]   [Protocol: TCP]

303	   Any one of these sub-entries on the aggregate connection attempt
304	   would satisfy the original application intent.  The concern of this
305	   section is the algorithm defining which of these options to try,
306	   when, and in what order.

308	4.1.  Candidate Gathering

310	   The step of gathering candidates involves identifying which paths,
311	   protocols, and endpoints may be used for a given Connection.  This
312	   list is determined by the requirements, prohibitions, and preferences
313	   of the application as specified in the Path Selection Properties and
314	   Protocol Selection Properties.

316	4.1.1.  Structuring Options as a Tree

318	   When an implementation responsible for connection establishment needs
319	   to consider multiple options, it should logically structure these
320	   options as a hierarchical tree.  Each leaf node of the tree
321	   represents a single, coherent connection attempt, with an Endpoint, a
322	   Path, and a set of protocols that can directly negotiate and send
323	   data on the network.  Each node in the tree that is not a leaf
324	   represents a connection attempt that is either underspecified, or
325	   else includes multiple distinct options.  For example. when
326	   connecting on an IP network, a connection attempt to a hostname and
327	   port is underspecified, because the connection attempt requires a
328	   resolved IP address as its remote endpoint.  In this case, the node
329	   represented by the connection attempt to the hostname is a parent
330	   node, with child nodes for each IP address.  Similarly, an
331	   implementation that is allowed to connect using multiple interfaces
332	   will have a parent node of the tree for the decision between the
333	   paths, with a branch for each interface.

335	   The example aggregate connection attempt above can be drawn as a tree
336	   by grouping the addresses resolved on the same interface into
337	   branches:

339	                             ||
340	                +==========================+
341	                |  www.example.com:80/Any  |
342	                +==========================+
343	                  //                    \\
344	+==========================+       +==========================+
345	| www.example.com:80/Wi-Fi |       |  www.example.com:80/LTE  |
346	+==========================+       +==========================+
347	             ||                      //                    \\
348	  +====================+  +====================+  +======================+
349	  | 192.0.2.1:80/Wi-Fi |  |  192.0.2.1:80/LTE  |  |  2001:DB8::1.80/LTE  |
350	  +====================+  +====================+  +======================+

352	   The rest of this section will use a notation scheme to represent this
353	   tree.  The parent (or trunk) node of the tree will be represented by
354	   a single integer, such as "1".  Each child of that node will have an
355	   integer that identifies it, from 1 to the number of children.  That
356	   child node will be uniquely identified by concatenating its integer
357	   to it's parents identifier with a dot in between, such as "1.1" and
358	   "1.2".  Each node will be summarized by a tuple of three elements:
359	   Endpoint, Path, and Protocol.  The above example can now be written
360	   more succinctly as:

362	   1 [www.example.com:80, Any, TCP]
363	     1.1 [www.example.com:80, Wi-Fi, TCP]
364	       1.1.1 [192.0.2.1:80, Wi-Fi, TCP]
365	     1.2 [www.example.com:80, LTE, TCP]
366	       1.2.1 [192.0.2.1:80, LTE, TCP]
367	       1.2.2 [2001:DB8::1.80, LTE, TCP]

369	   When an implementation views this aggregate set of connection
370	   attempts as a single connection establishment, it only will use one
371	   of the leaf nodes to transfer data.  Thus, when a single leaf node
372	   becomes ready to use, then the entire connection attempt is ready to
373	   use by the application.  Another way to represent this is that every
374	   leaf node updates the state of its parent node when it becomes ready,
375	   until the trunk node of the tree is ready, which then notifies the
376	   application that the connection as a whole is ready to use.

378	   A connection establishment tree may be degenerate, and only have a
379	   single leaf node, such as a connection attempt to an IP address over
380	   a single interface with a single protocol.

382	   1 [192.0.2.1:80, Wi-Fi, TCP]
383	   A parent node may also only have one child (or leaf) node, such as a
384	   when a hostname resolves to only a single IP address.

386	   1 [www.example.com:80, Wi-Fi, TCP]
387	     1.1 [192.0.2.1:80, Wi-Fi, TCP]

389	4.1.2.  Branch Types

391	   There are three types of branching from a parent node into one or
392	   more child nodes.  Any parent node of the tree must only use one type
393	   of branching.

395	4.1.2.1.  Derived Endpoints

397	   If a connection originally targets a single endpoint, there may be
398	   multiple endpoints of different types that can be derived from the
399	   original.  The connection library should order the derived endpoints
400	   according to application preference, system policy and expected
401	   performance.

403	   DNS hostname-to-address resolution is the most common method of
404	   endpoint derivation.  When trying to connect to a hostname endpoint
405	   on a traditional IP network, the implementation should send DNS
406	   queries for both A (IPv4) and AAAA (IPv6) records if both are
407	   supported on the local link.  The algorithm for ordering and racing
408	   these addresses should follow the recommendations in Happy Eyeballs
409	   [RFC8305].

411	   1 [www.example.com:80, Wi-Fi, TCP]
412	     1.1 [2001:DB8::1.80, Wi-Fi, TCP]
413	     1.2 [192.0.2.1:80, Wi-Fi, TCP]
414	     1.3 [2001:DB8::2.80, Wi-Fi, TCP]
415	     1.4 [2001:DB8::3.80, Wi-Fi, TCP]

417	   DNS-Based Service Discovery can also provide an endpoint derivation
418	   step.  When trying to connect to a named service, the client may
419	   discover one or more hostname and port pairs on the local network
420	   using multicast DNS.  These hostnames should each be treated as a
421	   branch which can be attempted independently from other hostnames.
422	   Each of these hostnames may also resolve to one or more addresses,
423	   thus creating multiple layers of branching.

425	   1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP]
426	     1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP]
427	       1.1.1 [31.133.160.18.631, Wi-Fi, TCP]

429	4.1.2.2.  Alternate Paths

431	   If a client has multiple network interfaces available to it, such as
432	   mobile client with both Wi-Fi and Cellular connectivity, it can
433	   attempt a connection over either interface.  This represents a branch
434	   point in the connection establishment.  Like with derived endpoints,
435	   the interfaces should be ranked based on preference, system policy,
436	   and performance.  Attempts should be started on one interface, and
437	   then on other interfaces successively after delays based on expected
438	   round-trip-time or other available metrics.

440	   1 [192.0.2.1:80, Any, TCP]
441	     1.1 [192.0.2.1:80, Wi-Fi, TCP]
442	     1.2 [192.0.2.1:80, LTE, TCP]

444	   This same approach applies to any situation in which the client is
445	   aware of multiple links or views of the network.  Multiple Paths,
446	   each with a coherent set of addresses, routes, DNS server, and more,
447	   may share a single interface.  A path may also represent a virtual
448	   interface service such as a Virtual Private Network (VPN).

450	   The list of available paths should be constrained by any requirements
451	   or prohibitions the application sets, as well as system policy.

453	4.1.2.3.  Protocol Options

455	   Differences in possible protocol compositions and options can also
456	   provide a branching point in connection establishment.  This allows
457	   clients to be resilient to situations in which a certain protocol is
458	   not functioning on a server or network.

460	   This approach is commonly used for connections with optional proxy
461	   server configurations.  A single connection may be allowed to use an
462	   HTTP-based proxy, a SOCKS-based proxy, or connect directly.  These
463	   options should be ranked and attempted in succession.

465	   1 [www.example.com:80, Any, HTTP/TCP]
466	     1.1 [192.0.2.8:80, Any, HTTP/HTTP Proxy/TCP]
467	     1.2 [192.0.2.7:10234, Any, HTTP/SOCKS/TCP]
468	     1.3 [www.example.com:80, Any, HTTP/TCP]
469	       1.3.1 [192.0.2.1:80, Any, HTTP/TCP]

471	   This approach also allows a client to attempt different sets of
472	   application and transport protocols that may provide preferable
473	   characteristics when available.  For example, the protocol options
474	   could involve QUIC [I-D.ietf-quic-transport] over UDP on one branch,
475	   and HTTP/2 [RFC7540] over TLS over TCP on the other:

477	   1 [www.example.com:443, Any, Any HTTP]
478	     1.1 [www.example.com:443, Any, QUIC/UDP]
479	       1.1.1 [192.0.2.1:443, Any, QUIC/UDP]
480	     1.2 [www.example.com:443, Any, HTTP2/TLS/TCP]
481	       1.2.1 [192.0.2.1:443, Any, HTTP2/TLS/TCP]

483	   Another example is racing SCTP with TCP:

485	   1 [www.example.com:80, Any, Any Stream]
486	     1.1 [www.example.com:80, Any, SCTP]
487	       1.1.1 [192.0.2.1:80, Any, SCTP]
488	     1.2 [www.example.com:80, Any, TCP]
489	       1.2.1 [192.0.2.1:80, Any, TCP]

491	   Implementations that support racing protocols and protocol options
492	   should maintain a history of which protocols and protocol options
493	   successfully established, on a per-network basis (see Section 8.2).
494	   This information can influence future racing decisions to prioritize
495	   or prune branches.

497	4.2.  Branching Order-of-Operations

499	   Branch types must occur in a specific order relative to one another
500	   to avoid creating leaf nodes with invalid or incompatible settings.
501	   In the example above, it would be invalid to branch for derived
502	   endpoints (the DNS results for www.example.com) before branching
503	   between interface paths, since usable DNS results on one network may
504	   not necessarily be the same as DNS results on another network due to
505	   local network entities, supported address families, or enterprise
506	   network configurations.  Implementations must be careful to branch in
507	   an order that results in usable leaf nodes whenever there are
508	   multiple branch types that could be used from a single node.

510	   The order of operations for branching, where lower numbers are acted
511	   upon first, should be:

513	   1.  Alternate Paths

515	   2.  Protocol Options

517	   3.  Derived Endpoints

519	   Branching between paths is the first in the list because results
520	   across multiple interfaces are likely not related to one another:
521	   endpoint resolution may return different results, especially when
522	   using locally resolved host and service names, and which protocols
523	   are supported and preferred may differ across interfaces.  Thus, if
524	   multiple paths are attempted, the overall connection can be seen as a
525	   race between the available paths or interfaces.

527	   Protocol options are checked next in order.  Whether or not a set of
528	   protocol, or protocol-specific options, can successfully connect is
529	   generally not dependent on which specific IP address is used.
530	   Furthermore, the protocol stacks being attempted may influence or
531	   altogether change the endpoints being used.  Adding a proxy to a
532	   connection's branch will change the endpoint to the proxy's IP
533	   address or hostname.  Choosing an alternate protocol may also modify
534	   the ports that should be selected.

536	   Branching for derived endpoints is the final step, and may have
537	   multiple layers of derivation or resolution, such as DNS service
538	   resolution and DNS hostname resolution.

540	4.3.  Sorting Branches

542	   Implementations should sort the branches of the tree of connection
543	   options in order of their preference rank.  Leaf nodes on branches
544	   with higher rankings represent connection attempts that will be raced
545	   first.  Implementations should order the branches to reflect the
546	   preferences expressed by the application for its new connection,
547	   including Protocol and Path Selection Properties, which are specified
548	   in [I-D.trammell-taps-interface].  In addition to the properties
549	   provided by the application, an implementation may include additional
550	   criteria such as cached performance estimates, see Section 8.2, or
551	   system policy, see Section 3.2, in the ranking.  Two examples of how
552	   the Protocol and Path Selection Properties may be used to sort
553	   branches are provided below:

555	   o  Interface Type: If the application specifies an interface type to
556	      be preferred or avoided, implementations should rank paths
557	      accordingly.  If the application specifies an interface type to be
558	      required or prohibited, we expect an implementation to not include
559	      the non-conforming paths into the three.

561	   o  Capacity Profile: An implementation may use the Capacity Profile
562	      to prefer paths optimized for the application's expected traffic
563	      pattern according to cached performance estimates, see
564	      Section 8.2:

566	      *  Interactive/Low Latency: Prefer paths with the lowest expected
567	         Round Trip Time

569	      *  Constant Rate: Prefer paths that can satisfy the requested
570	         Stream Send or Stream Receive Bitrate, based on observed
571	         maximum throughput

573	      *  Scavenger/Bulk: Prefer paths with the highest expected
574	         available bandwidth, based on observed maximum throughput

576	   [Note: See Appendix A.1 for additional examples related to Properties
577	   under discussion.]

579	4.4.  Candidate Racing

581	   The primary goal of the Candidate Racing process is to successfully
582	   negotiate a protocol stack to an endpoint over an interface--to
583	   connect a single leaf node of the tree--with as little delay and as
584	   few unnecessary connections attempts as possible.  Optimizing these
585	   two factors improves the user experience, while minimizing network
586	   load.

588	   This section covers the dynamic aspect of connection establishment.
589	   While the tree described above is a useful conceptual and
590	   architectural model, an implementation does not know what the full
591	   tree may become up front, nor will many of the possible branches be
592	   used in the common case.

594	   There are three different approaches to racing the attempts for
595	   different nodes of the connection establishment tree:

597	   1.  Immediate

599	   2.  Delayed

601	   3.  Failover

603	   Each approach is appropriate in different use-cases and branch types.
604	   However, to avoid consuming unnecessary network resources,
605	   implementations should not use immediate racing as a default
606	   approach.

608	   The timing algorithms for racing should remain independent across
609	   branches of the tree.  Any timers or racing logic is isolated to a
610	   given parent node, and is not ordered precisely with regards to other
611	   children of other nodes.

613	4.4.1.  Delayed Racing

615	   Delayed racing can be used whenever a single node of the tree has
616	   multiple child nodes.  Based on the order determined when building
617	   the tree, the first child node will be initiated immediately,
618	   followed by the next child node after some delay.  Once that second
619	   child node is initiated, the third child node (if present) will begin
620	   after another delay, and so on until all child nodes have been
621	   initiated, or one of the child nodes successfully completes its
622	   negotiation.

624	   Delayed racing attempts occur in parallel.  Implementations should
625	   not terminate an earlier child connection attempt upon starting a
626	   secondary child.

628	   The delay between starting child nodes should be based on the
629	   properties of the previously started child node.  For example, if the
630	   first child represents an IP address with a known route, and the
631	   second child represents another IP address, the delay between
632	   starting the first and second IP addresses can be based on the
633	   expected retransmission cadence for the first child's connection
634	   (derived from historical round-trip-time).  Alternatively, if the
635	   first child represents a branch on a Wi-Fi interface, and the second
636	   child represents a branch on an LTE interface, the delay should be
637	   based on the expected time in which the branch for the first
638	   interface would be able to establish a connection, based on link
639	   quality and historical round-trip-time.

641	   Any delay should have a defined minimum and maximum value based on
642	   the branch type.  Generally, branches between paths and protocols
643	   should have longer delays than branches between derived endpoints.
644	   The maximum delay should be considered with regards to how long a
645	   user is expected to wait for the connection to complete.

647	   If a child node fails to connect before the delay timer has fired for
648	   the next child, the next child should be started immediately.

650	4.4.2.  Failover

652	   If an implementation or application has a strong preference for one
653	   branch over another, the branching node may choose to wait until one
654	   child has failed before starting the next.  Failure of a leaf node is
655	   determined by its protocol negotiation failing or timing out; failure
656	   of a parent branching node is determined by all of its children
657	   failing.

659	   An example in which failover is recommended is a race between a
660	   protocol stack that uses a proxy and a protocol stack that bypasses
661	   the proxy.  Failover is useful in case the proxy is down or
662	   misconfigured, but any more aggressive type of racing may end up
663	   unnecessarily avoiding a proxy that was preferred by policy.

665	4.5.  Completing Establishment

667	   The process of connection establishment completes when one leaf node
668	   of the tree has completed negotiation with the remote endpoint
669	   successfully, or else all nodes of the tree have failed to connect.
670	   The first leaf node to complete its connection is then used by the
671	   application to send and receive data.

673	   It is useful to process success and failure throughout the tree by
674	   child nodes reporting to their parent nodes (towards the trunk of the
675	   tree).  For example, in the following case, if 1.1.1 fails to
676	   connect, it reports the failure to 1.1.  Since 1.1 has no other child
677	   nodes, it also has failed and reports that failure to 1.  Because 1.2
678	   has not yet failed, 1 is not considered to have failed.  Since 1.2
679	   has not yet started, it is started and the process continues.
680	   Similarly, if 1.1.1 successfully connects, then it marks 1.1 as
681	   connected, which propagates to the trunk node 1.  At this point, the
682	   connection as a whole is considered to be successfully connected and
683	   ready to process application data

685	   1 [www.example.com:80, Any, TCP]
686	     1.1 [www.example.com:80, Wi-Fi, TCP]
687	       1.1.1 [192.0.2.1:80, Wi-Fi, TCP]
688	     1.2 [www.example.com:80, LTE, TCP]
689	   ...

691	   If a leaf node has successfully completed its connection, all other
692	   attempts should be made ineligible for use by the application for the
693	   original request.  New connection attempts that involve transmitting
694	   data on the network should not be started after another leaf node has
695	   completed successfully, as the connection as a whole has been
696	   established.  An implementation may choose to let certain handshakes
697	   and negotiations complete in order to gather metrics to influence
698	   future connections.  Similarly, an implementation may choose to hold
699	   onto fully established leaf nodes that were not the first to
700	   establish for use in future connections, but this approach is not
701	   recommended since those attempts were slower to connect and may
702	   exhibit less desirable properties.

704	4.5.1.  Determining Successful Establishment

706	   Implementations may select the criteria by which a leaf node is
707	   considered to be successfully connected differently on a per-protocol
708	   basis.  If the only protocol being used is a transport protocol with
709	   a clear handshake, like TCP, then the obvious choice is to declare
710	   that node "connected" when the last packet of the three-way handshake
711	   has been received.  If the only protocol being used is an
712	   "unconnected" protocol, like UDP, the implementation may consider the
713	   node fully "connected" the moment it determines a route is present,
714	   before sending any packets on the network, see further Section 4.7.

716	   For protocol stacks with multiple handshakes, the decision becomes
717	   more nuanced.  If the protocol stack involves both TLS and TCP, an
718	   implementation could determine that a leaf node is connected after
719	   the TCP handshake is complete, or it can wait for the TLS handshake
720	   to complete as well.  The benefit of declaring completion when the
721	   TCP handshake finishes, and thus stopping the race for other branches
722	   of the tree, is that there will be less burden on the network from
723	   other connection attempts.  On the other hand, by waiting until the
724	   TLS handshake is complete, an implementation avoids the scenario in
725	   which a TCP handshake completes quickly, but TLS negotiation is
726	   either very slow or fails altogether in particular network conditions
727	   or to a particular endpoint.  To avoid the issue of TLS possibly
728	   failing, the implementation should not generate a Ready event for the
729	   Connection until TLS is established.

731	   If all of the leaf nodes fail to connect during racing, i.e. none of
732	   the configurations that satisfy all requirements given in the
733	   Transport Parameters actually work over the available paths, then the
734	   transport system should notify the application with an InitiateError
735	   event.  An InitiateError event should also be generated in case the
736	   transport system finds no usable candidates to race.

738	4.6.  Establishing multiplexed connections

740	   Multiplexing several Connections over a single underlying transport
741	   connection requires that the Connections to be multiplexed belong to
742	   the same Connection Group (as is indicated by the application using
743	   the Clone call).  When the underlying transport connection supports
744	   multi-streaming, the Transport System can map each Connection in the
745	   Connection Group to a different stream.  Thus, when the Connections
746	   that are offered to an application by the Transport System are
747	   multiplexed, the Transport System may implement the establishment of
748	   a new Connection by simply beginning to use a new stream of an
749	   already established transport connection and there is no need for a
750	   connection establishment procedure.  This, then, also means that
751	   there may not be any "establishment" message (like a TCP SYN), but
752	   the application can simply start sending or receiving.  Therefore,
753	   when the Initiate action of a Transport System is called without
754	   Messages being handed over, it cannot be guaranteed that the other
755	   endpoint will have any way to know about this, and hence a passive
756	   endpoint's ConnectionReceived event may not be called upon an active
757	   endpoint's Inititate.  Instead, calling the ConnectionReceived event
758	   may be delayed until the first Message arrives.

760	4.7.  Handling racing with "unconnected" protocols

762	   While protocols that use an explicit handshake to validate a
763	   Connection to a peer can be used for racing multiple establishment
764	   attempts in parallel, "unconnected" protocols such as raw UDP do not
765	   offer a way to validate the presence of a peer or the usability of a
766	   Connection without application feedback.  An implementation should
767	   consider such a protocol stack to be established as soon as a local
768	   route to the peer endpoint is confirmed.

770	   However, if a peer is not reachable over the network using the
771	   unconnected protocol, or data cannot be exchanged for any other
772	   reason, the application may want to attempt using another candidate
773	   Protocol Stack.  The implementation should maintain the list of other
774	   candidate Protocol Stacks that were eligible to use.  In the case
775	   that the application signals that the initial Protocol Stack is
776	   failing for some reason and that another option should be attempted,
777	   the Connection can be updated to point to the next candidate Protocol
778	   Stack.  This can be viewed as an application-driven form of Protocol
779	   Stack racing.

781	4.8.  Implementing listeners

783	   When an implementation is asked to Listen, it registers with the
784	   system to wait for incoming traffic to the Local Endpoint.  If no
785	   Local Endpoint is specified, the implementation should either use an
786	   ephemeral port or generate an error.

788	   If the Path Selection Properties do not require a single network
789	   interface or path, but allow the use of multiple paths, the Listener
790	   object should register for incoming traffic on all of the network
791	   interfaces or paths that conform to the Path Selection Properties.
792	   The set of available paths can change over time, so the
793	   implementation should monitor network path changes and register and
794	   de-register the Listener across all usable paths.  When using
795	   multiple paths, the Listener is generally expected to use the same
796	   port for listening on each.

798	   If the Protocol Selection Properties allow multiple protocols to be
799	   used for listening, and the implementation supports it, the Listener
800	   object should register across the eligble protocols for each path.
801	   This means that inbound Connections delivered by the implementation
802	   may have heterogeneous protocol stacks.

804	4.8.1.  Implementing listeners for Connected Protocols

806	   Connected protocols such as TCP and TLS-over-TCP have a strong
807	   mapping between the Local and Remote Endpoints (five-tuple) and their
808	   protocol connection state.  These map well into Connection objects.
809	   Whenever a new inbound handshake is being started, the Listener
810	   should generate a new Connection object and pass it to the
811	   application.

813	4.8.2.  Implementing listeners for Unconnected Protocols

815	   Unconnected protocols such as UDP and UDP-lite generally do not
816	   provide the same mechanisms that connected protocols do to offer
817	   Connection objects.  Implementations should wait for incoming packets
818	   for unconnected protocols on a listening port and should perform
819	   five-tuple matching of packets to either existing Connection objects
820	   or the creation of new Connection objects.  On platforms with
821	   facilities to create a "virtual connection" for unconnected protocols
822	   implementations should use these mechanisms to minimise the handling
823	   of datagrams intended for already created Connection objects.

825	4.8.3.  Implementing listeners for Multiplexed Protocols

827	   Protocols that provide multiplexing of streams into a single five-
828	   tuple can listen both for entirely new connections (a new HTTP/2
829	   stream on a new TCP connection, for example) and for new sub-
830	   connections (a new HTTP/2 stream on an existing connection).  If the
831	   abstraction of Connection presented to the application is mapped to
832	   the multiplexed stream, then the Listener should deliver new
833	   Connection objects in the same way for either case.  The
834	   implementation should allow the application to introspect the
835	   Connection Group marked on the Connections to determine the grouping
836	   of the multiplexing.

838	5.  Implementing Data Transfer

840	5.1.  Data transfer for streams, datagrams, and frames

842	   The most basic mapping for sending a Message is an abstraction of
843	   datagrams, in which the transport protocol naturally deals in
844	   discrete packets.  Each Message here corresponds to a single
845	   datagram.  Generally, these will be short enough that sending and
846	   receiving will always use a complete Message.

848	   For protocols that expose byte-streams, the only delineation provided
849	   by the protocol is the end of the stream in a given direction.  Each
850	   Message in this case corresponds to the entire stream of bytes in a
851	   direction.  These Messages may be quite long, in which case they can
852	   be sent in multiple parts.

854	   Protocols that provide the framing (such as length-value protocols,
855	   or protocols that use delimeters) provide data boundaries that may be
856	   longer than a traditional packet datagram.  Each Message for framing
857	   protocols corresponds to a single frame, which may be sent either as
858	   a complete Message, or in multiple parts.

860	5.1.1.  Sending Messages

862	   The effect of the application sending a Message is determined by the
863	   top-level protocol in the established Protocol Stack.  That is, if
864	   the top-level protocol provides an abstraction of framed messages
865	   over a connection, the receiving application will be able to obtain
866	   multiple Messages on that connection, even if the framing protocol is
867	   built on a byte-stream protocol like TCP.

869	5.1.1.1.  Send Parameters

871	   o  Lifetime: this should be implemented by removing the Message from
872	      its queue of pending Messages after the Lifetime has expired.  A
873	      queue of pending Messages within the transport system
874	      implementation that have yet to be handed to the Protocol Stack
875	      can always support this property, but once a Message has been sent
876	      into the send buffer of a protocol, only certain protocols may
877	      support de-queueing a message.  For example, TCP cannot remove
878	      bytes from its send buffer, while in case of SCTP, such control
879	      over the SCTP send buffer can be exercised using the partial
880	      reliability extension [RFC8303].  When there is no standing queue
881	      of Messages within the system, and the Protocol Stack does not
882	      support removing a Message from its buffer, this property may be
883	      ignored.

885	   o  Niceness: this represents the ability to de-prioritize a Message
886	      in favor of other Messages.  This can be implemented by the system
887	      re-ordering Messages that have yet to be handed to the Protocol
888	      Stack, or by giving relative priority hints to protocols that
889	      support priorities per Message.  For example, an implementation of
890	      HTTP/2 could choose to send Messages of different niceness on
891	      streams of different priority.

893	   o  Ordered: when this is false, it disables the requirement of in-
894	      order-delivery for protocols that support configurable ordering.

896	   o  Idempotent: when this is true, it means that the Message can be
897	      used by mechanisms that might transfer it multiple times - e.g.,
898	      as a result of racing multiple transports or as part of TCP Fast
899	      Open.

901	   o  Corruption Protection Length: when this is set to any value other
902	      than -1, it limits the required checksum in protocols that allow
903	      limiting the checksum length (e.g.  UDP-Lite).

905	   o  Immediate Acknowledgement: this informs the implementation that
906	      the sender intends to execute tight control over the send buffer,
907	      and therefore wants to avoid delayed acknowledgements.  In case of
908	      SCTP, a request to immediately send acknowledgements can be
909	      implemented using the "sack-immediately flag" described in
910	      Section 4.2 of [RFC8303] for the SEND.SCTP primitive.

912	   o  Instantaneous Capacity Profile: when this is set to "Interactive/
913	      Low Latency", the Message should be sent immediately, even when
914	      this comes at the cost of using the network capacity less
915	      efficiently.  For example, small messages can sometimes be bundled
916	      to fit into a single data packet for the sake of reducing header
917	      overhead; such bundling should not be used.  For example, in case
918	      of TCP, the Nagle algorithm should be disabled when Interactive/
919	      Low Latency is selected as the capacity profile.  Scavenger/Bulk
920	      can translate into usage of a congestion control mechanism such as
921	      LEDBAT, and/or the capacity profile can lead to a choice of a DSCP
922	      value as described in [I-D.ietf-taps-minset]).

924	   [Note: See also Appendix A.2 for additional Send Parameters under
925	   discussion.]

927	5.1.1.2.  Send Completion

929	   The application should be notified whenever a Message or partial
930	   Message has been consumed by the Protocol Stack, or has failed to
931	   send.  The meaning of the Message being consumed by the stack may
932	   vary depending on the protocol.  For a basic datagram protocol like
933	   UDP, this may correspond to the time when the packet is sent into the
934	   interface driver.  For a protocol that buffers data in queues, like
935	   TCP, this may correspond to when the data has entered the send
936	   buffer.

938	5.1.2.  Receiving Messages

940	   Similar to sending, Receiving a Message is determined by the top-
941	   level protocol in the established Protocol Stack.  The main
942	   difference with Receiving is that the size and boundaries of the
943	   Message are not known beforehand.  The application can communicate in
944	   its Receive action the parameters for the Message, which can help the
945	   implementation know how much data to deliver and when.  For example,
946	   if the application only wants to receive a complete Message, the
947	   implementation should wait until an entire Message (datagram, stream,
948	   or frame) is read before delivering any Message content to the
949	   application.  This requires the implementation to understand where
950	   messages end, either via a supplied deframer or because the top-level
951	   protocol in the established Protocol Stack preserves message
952	   boundaries; if, on the other hand, the top-level protocol only
953	   supports a byte-stream and no deframers were supported, the
954	   application must specify the minimum number of bytes of Message
955	   content it wants to receive (which may be just a single byte) to
956	   control the flow of received data.

958	   If a Connection becomes finished before a requested Receive action
959	   can be satisfied, the implementation should deliver any partial
960	   Message content outstanding, or if none is available, an indication
961	   that there will be no more received Messages.

963	5.2.  Handling of data for fast-open protocols

965	   Several protocols allow sending higher-level protocol or application
966	   data within the first packet of their protocol establishment, such as
967	   TCP Fast Open [RFC7413] and TLS 1.3 [I-D.ietf-tls-tls13].  This
968	   approach is referred to as sending Zero-RTT (0-RTT) data.  This is a
969	   desirable property, but poses challenges to an implementation that
970	   uses racing during connection establishment.

972	   If the application has 0-RTT data to send in any protocol handshakes,
973	   it needs to provide this data before the handshakes have begun.  When
974	   racing, this means that the data should be provided before the
975	   process of connection establishment has begun.  If the application
976	   wants to send 0-RTT data, it must indicate this to the implementation
977	   by setting the Idempotent send parameter to true when sending the
978	   data.  In general, 0-RTT data may be replayed (for example, if a TCP
979	   SYN contains data, and the SYN is retransmitted, the data will be
980	   retransmitted as well), but racing means that different leaf nodes
981	   have the opportunity to send the same data independently.  If data is
982	   truly idempotent, this should be permissible.

984	   Once the application has provided its 0-RTT data, an implementation
985	   should keep a copy of this data and provide it to each new leaf node
986	   that is started and for which a 0-RTT protocol is being used.

988	   It is also possible that protocol stacks within a particular leaf
989	   node use 0-RTT handshakes without any idempotent application data.
990	   For example, TCP Fast Open could use a Client Hello from TLS as its
991	   0-RTT data, shortening the cumulative handshake time.

993	   0-RTT handshakes often rely on previous state, such as TCP Fast Open
994	   cookies, previously established TLS tickets, or out-of-band
995	   distributed pre-shared keys (PSKs).  Implementations should be aware
996	   of security concerns around using these tokens across multiple
997	   addresses or paths when racing.  In the case of TLS, any given ticket
998	   or PSK should only be used on one leaf node.  If implementations have
999	   multiple tickets available from a previous connection, each leaf node
1000	   attempt must use a different ticket.  In effect, each leaf node will
1001	   send the same early application data, yet encoded (encrypted)
1002	   differently on the wire.

1004	6.  Implementing Maintenance

1006	   Maintenance encompasses changes that the application can request to a
1007	   Connection, or that a Connection can react to based on system and
1008	   network changes.

1010	6.1.  Changing Protocol Properties

1012	   Appendix A.1 of [I-D.ietf-taps-minset] explains, using primitives
1013	   that are described in [RFC8303] and [RFC8304], how to implement
1014	   changing the following protocol properties of an established
1015	   connection with TCP and UDP.  Below, we amend this description for
1016	   other protocols (if applicable):

1018	   o  Relative niceness: for SCTP, this can be done using the primitive
1019	      CONFIGURE_STREAM_SCHEDULER.SCTP described in section 4 of
1020	      [RFC8303].

1022	   o  Timeout for aborting Connection: for SCTP, this can be done using
1023	      the primitive CHANGE_TIMEOUT.SCTP described in section 4 of
1024	      [RFC8303].

1026	   o  Abort timeout to suggest to the Remote Endpoint: for TCP, this can
1027	      be done using the primitive CHANGE_TIMEOUT.TCP described in
1028	      section 4 of [RFC8303].

1030	   o  Retransmission threshold before excessive retransmission
1031	      notification: for TCP, this can be done using ERROR.TCP described
1032	      in section 4 of [RFC8303].

1034	   o  Required minimum coverage of the checksum for receiving: for UDP-
1035	      Lite, this can be done using the primitive
1036	      SET_MIN_CHECKSUM_COVERAGE.UDP-Lite described in section 4 of
1037	      [RFC8303].

1039	   o  Connection group transmission scheduler: for SCTP, this can be
1040	      done using the primitive SET_STREAM_SCHEDULER.SCTP described in
1041	      section 4 of [RFC8303].

1043	   It may happen that the application attempts to set a Protocol
1044	   Property which does not apply to the actually chosen protocol.  In
1045	   this case, the implementation should fail gracefully, i.e., it may
1046	   give a warning to the application, but it should not terminate the
1047	   Connection.

1049	6.2.  Handling Path Changes

1051	   When a path change occurs, the Transport Services implementation is
1052	   responsible for notifying Protocol Instances in the Protocol Stack.
1053	   If the Protocol Stack includes a transport protocol that supports
1054	   multipath connectivity, an update to the available paths should
1055	   inform the Protocol Instance of the new set of paths that are
1056	   permissible based on the Path Selection Properties passed by the
1057	   application.  A multipath protocol can establish new subflows over
1058	   new paths, and should tear down subflows over paths that are no
1059	   longer available.  If the Protocol Stack includes a transport
1060	   protocol that does not support multipath, but support migrating
1061	   between paths, the update to available paths can be used as the
1062	   trigger to migrating the connection.  For protocols that do not
1063	   support multipath or migration, the Protocol Instances may be
1064	   informed of the path change, but should not be forcibly disconnected
1065	   if the previously used path becomes unavailable.  An exception to
1066	   this case is if the System Policy changes to prohibit traffic from
1067	   the Connection based on its properties, in which case the Protocol
1068	   Stack should be disconnected.

1070	7.  Implementing Termination

1072	   With TCP, when an application closes a connection, this means that it
1073	   has no more data to send (but expects all data that has been handed
1074	   over to be reliably delivered).  However, with TCP only, "close" does
1075	   not mean that the application will stop receiving data.  This is
1076	   related to TCP's ability to support half-closed connections.

1078	   SCTP is an example of a protocol that does not support such half-
1079	   closed connections.  Hence, with SCTP, the meaning of "close" is
1080	   stricter: an application has no more data to send (but expects all
1081	   data that has been handed over to be reliably delivered), and will
1082	   also not receive any more data.

1084	   Implementing a protocol independent transport system means that the
1085	   exposed semantics must be the strictest subset of the semantics of
1086	   all supported protocols.  Hence, as is common with all reliable
1087	   transport protocols, after a Close action, the application can expect
1088	   to have its reliability requirements honored regarding the data it
1089	   has given to the Transport System, but it cannot expect to be able to
1090	   read any more data after calling Close.

1092	   Abort differs from Close only in that no guarantees are given
1093	   regarding data that the application has handed over to the Tranport
1094	   System before calling Abort.

1096	   As explained in section Section 4.6, when a new stream is multiplexed
1097	   on an already existing connection of a Transport Protocol Instance,
1098	   there is no need for a connection establishment procedure.  Because
1099	   the Connections that are offered by the Transport System can be
1100	   implemented as streams that are multiplexed on a transport protocol's
1101	   connection, it can therefore not be guaranteed that one Endpoint's
1102	   Initiate action provokes a ConnectionReceived event at its peer.

1104	   For Close (provoking a Finished event) and Abort (provoking a
1105	   ConnectionError event), the same logic applies: while it is desirable
1106	   to be informed when a peer closes or aborts a Connection, whether
1107	   this is possible depends on the underlying protocol, and no
1108	   guarantees can be given.  With SCTP, the transport system can use the
1109	   stream reset procedure to cause a Finish event upon a Close action
1110	   from the peer [NEAT-flow-mapping].

1112	8.  Cached State

1114	   Beyond a single Connection's lifetime, it is useful for an
1115	   implementation to keep state and history.  This cached state can help
1116	   improve future Connection establishment due to re-using results and
1117	   credentials, and favoring paths and protocols that performed well in
1118	   the past.

1120	   Cached state may be associated with different Endpoints for the same
1121	   Connection, depending on the protocol generating the cached content.
1122	   For example, session tickets for TLS are associated with specific
1123	   endpoints, and thus should be cached based on a Connection's hostname
1124	   Endpoint (if applicable).  On the other hand, performance
1125	   characteristics of a path are more likely tied to the IP address and
1126	   subnet being used.

1128	8.1.  Protocol state caches

1130	   Some protocols will have long-term state to be cached in association
1131	   with Endpoints.  This state often has some time after which it is
1132	   expired, so the implementation should allow each protocol to specify
1133	   an expiration for cached content.

1135	   Examples of cached protocol state include:

1137	   o  The DNS protocol can cache resolution answers (A and AAAA queries,
1138	      for example), associated with a Time To Live (TTL) to be used for
1139	      future hostname resolutions without requiring asking the DNS
1140	      resolver again.

1142	   o  TLS caches session state and tickets based on a hostname, which
1143	      can be used for resuming sessions with a server.

1145	   o  TCP can cache cookies for use in TCP Fast Open.

1147	   Cached protocol state is primarily used during Connection
1148	   establishment for a single Protocol Stack, but may be used to
1149	   influence an implementation's preference between several candidate
1150	   Protocol Stacks.  For example, if two IP address Endpoints are
1151	   otherwise equally preferred, an implementation may choose to attempt
1152	   a connection to an address for which it has a TCP Fast Open cookie.

1154	   Applications must have a way to flush protocol cache state if
1155	   desired.  This may be necessary, for example, if application-layer
1156	   identifiers rotate and clients wish to avoid linkability via
1157	   trackable TLS tickets or TFO cookies.

1159	8.2.  Performance caches

1161	   In addition to protocol state, Protocol Instances should provide data
1162	   into a performance-oriented cache to help guide future protocol and
1163	   path selection.  Some performance information can be gathered
1164	   generically across several protocols to allow predictive comparisons
1165	   between protocols on given paths:

1167	   o  Observed Round Trip Time

1169	   o  Connection Establishment latency

1171	   o  Connection Establishment success rate

1173	   These items can be cached on a per-address and per-subnet
1174	   granularity, and averaged between different values.  The information
1175	   should be cached on a per-network basis, since it is expected that
1176	   different network attachments will have different performance
1177	   characteristics.  Besides Protocol Instances, other system entities
1178	   may also provide data into performance-oriented caches.  This could
1179	   for instance be signal strength information reported by radio modems
1180	   like Wi-Fi and mobile broadband or information about the battery-
1181	   level of the device.  Furthermore, the system may cache the observed
1182	   maximum throughput on a path as an estimate of the available
1183	   bandwidth.

1185	   An implementation should use this information, when possible, to
1186	   determine preference between candidate paths, endpoints, and protocol
1187	   options.  Eligible options that historically had significantly better
1188	   performance than others should be selected first when gathering
1189	   candidates (see Section 4.1) to ensure better performance for the
1190	   application.

1192	   The reasonable lifetime for cached performance values will vary
1193	   depending on the nature of the value.  Certain information, like the
1194	   connection establishment success rate to a Remote Endpoint using a
1195	   given protocol stack, can be stored for a long period of time (hours
1196	   or longer), since it is expected that the capabilities of the Remote
1197	   Endpoint are not changing very quickly.  On the other hand, Round
1198	   Trip Time observed by TCP over a particular network path may vary
1199	   over a relatively short time interval.  For such values, the
1200	   implementation should remove them from the cache more quickly, or
1201	   treat older values with less confidence/weight.

1203	9.  Specific Transport Protocol Considerations

1205	9.1.  TCP

1207	   Connection lifetime for TCP translates fairly simply into the the
1208	   abstraction presented to an application.  When the TCP three-way
1209	   handshake is complete, its layer of the Protocol Stack can be
1210	   considered Ready (established).  This event will cause racing of
1211	   Protocol Stack options to complete if TCP is the top-level protocol,
1212	   at which point the application can be notified that the Connection is
1213	   Ready to send and receive.

1215	   If the application sends a Close, that can translate to a graceful
1216	   termination of the TCP connection, which is performed by sending a
1217	   FIN to the remote endpoint.  If the application sends an Abort, then
1218	   the TCP state can be closed abruptly, leading to a RST being sent to
1219	   the peer.

1221	   Without a layer of framing (a top-level protocol in the established
1222	   Protocol Stack that preserves message boundaries, or an application-
1223	   supplied deframer) on top of TCP, the receiver side of the transport
1224	   system implementation can only treat the incoming stream of bytes as
1225	   a single Message, terminated by a FIN when the Remote Endpoint closes
1226	   the Connection.

1228	9.2.  UDP

1230	   UDP as a direct transport does not provide any handshake or
1231	   connectivity state, so the notion of the transport protocol becoming
1232	   Ready or established is degenerate.  Once the system has validated
1233	   that there is a route on which to send and receive UDP datagrams, the
1234	   protocol is considered Ready.  Similarly, a Close or Abort has no
1235	   meaning to the on-the-wire protocol, but simply leads to the local
1236	   state being torn down.

1238	   When sending and receiving messages over UDP, each Message should
1239	   correspond to a single UDP datagram.  The Message can contain
1240	   metadata about the packet, such as the ECN bits applied to the
1241	   packet.

1243	9.3.  SCTP

1245	   To support sender-side stream schedulers (which are implemented on
1246	   the sender side), a receiver-side Transport System should always
1247	   support message interleaving [RFC8260].

1249	   SCTP messages can be very large.  To allow the reception of large
1250	   messages in pieces, a "partial flag" can be used to inform a (native
1251	   SCTP) receiving application that a message is incomplete.  After
1252	   receiving the "partial flag", this application would know that the
1253	   next receive calls will only deliver remaining parts of the same
1254	   message (i.e., no messages or partial messages will arrive on other
1255	   streams until the message is complete) (see Section 8.1.20 in
1256	   [RFC6458]).  The "partial flag" can therefore facilitate the
1257	   implementation of the receiver buffer in the receiving application,
1258	   at the cost of limiting multiplexing and temporarily creating head-
1259	   of-line blocking delay at the receiver.

1261	   When a Transport System transfers a Message, it seems natural to map
1262	   the Message object to SCTP messages in order to support properties
1263	   such as "Ordered" or "Lifetime" (which maps onto partially reliable
1264	   delivery with a SCTP_PR_SCTP_TTL policy [RFC6458]).  However, since
1265	   multiplexing of Connections onto SCTP streams may happen, and would
1266	   be hidden from the application, the Transport System requires a per-
1267	   stream receiver buffer anyway, so this potential benefit is lost and
1268	   the "partial flag" becomes unnecessary for the system.

1270	   The problem of long messages either requiring large receiver-side
1271	   buffers or getting in the way of multiplexing is addressed by message
1272	   interleaving [RFC8260], which is yet another reason why a receivers-
1273	   side transport system supporting SCTP should implement this
1274	   mechanism.

1276	9.4.  TLS

1278	   The mapping of a TLS stream abstraction into the application is
1279	   equivalent to the contract provided by TCP (see Section 9.1).  The
1280	   Ready state should be determined by the completion of the TLS
1281	   handshake, which involves potentially several more round trips beyond
1282	   the TCP handshake.  The application should not be notified that the
1283	   Connection is Ready until TLS is established.

1285	9.5.  HTTP

1287	   HTTP requests and responses map naturally into Messages, since they
1288	   are delineated chunks of data with metadata that can be sent over a
1289	   transport.  To that end, HTTP can be seen as the most prevalent
1290	   framing protocol that runs on top of streams like TCP, TLS, etc.

1292	   In order to use a transport Connection that provides HTTP Message
1293	   support, the establishment and closing of the connection can be
1294	   treated as it would without the framing protocol.  Sending and
1295	   receiving of Messages, however, changes to treat each Message as a
1296	   well-delineated HTTP request or response, with the content of the
1297	   Message representing the body, and the Headers being provided in
1298	   Message metadata.

1300	9.6.  QUIC

1302	   QUIC provides a multi-streaming interface to an encrypted transport.
1303	   Each stream can be viewed as equivalent to a TLS stream over TCP, so
1304	   a natural mapping is to present each QUIC stream as an individual
1305	   Connection.  The protocol for the stream will be considered Ready
1306	   whenever the underlying QUIC connection is established to the point
1307	   that this stream's data can be sent.  For streams after the first
1308	   stream, this will likely be an immediate operation.

1310	   Closing a single QUIC stream, presented to the application as a
1311	   Connection, does not imply closing the underlying QUIC connection
1312	   itself.  Rather, the implementation may choose to close the QUIC
1313	   connection once all streams have been closed (possibly after some
1314	   timeout), or after an individual stream Connection sends an Abort.

1316	   Messages over a direct QUIC stream should be represented similarly to
1317	   the TCP stream (one Message per direction, see Section 9.1), unless a
1318	   framing mapping is used on top of QUIC.

1320	9.7.  HTTP/2 transport

1322	   Similar to QUIC (Section 9.6), HTTP/2 provides a multi-streaming
1323	   interface.  This will generally use HTTP as the unit of Messages over
1324	   the streams, in which each stream can be represented as a transport
1325	   Connection.  The lifetime of streams and the HTTP/2 connection should
1326	   be managed as described for QUIC.

1328	   It is possible to treat each HTTP/2 stream as a raw byte-stream
1329	   instead of a carrier for HTTP messages, in which case the Messages
1330	   over the streams can be represented similarly to the TCP stream (one
1331	   Message per direction, see Section 9.1).

1333	10.  Rendezvous and Environment Discovery

1335	   The connection establishment process outlined in Section 4 is
1336	   appropriate for client-server connections, but needs to be expanded
1337	   in peer-to-peer Rendezvous scenarios, as follows:

1339	   o  Gathering Local Endpoint candidates

1341	      The set of possible Local Endpoints is gathered.  In the simple
1342	      case, this merely enumerates the local interfaces and protocols,
1343	      allocates ephemeral source ports.  For example, a system that has
1344	      WiFi and Ethernet and supports IPv4 and IPv6 might gather four
1345	      candidate locals (IPv4 on Ethernet, IPv6 on Ethernet, IPv4 on
1346	      WiFi, and IPv6 on WiFi) that can form the source for a transient.

1348	      If NAT traversal is required, the process of gathering Local
1349	      Endpoints becomes broadly equivalent to the ICE candidate
1350	      gathering phase [RFC5245].  The endpoint determines its server
1351	      reflexive Local Endpoints (i.e., the translated address of a
1352	      local, on the other side of a NAT) and relayed locals (e.g., via a
1353	      TURN server or other relay), for each interface and network
1354	      protocol.  These are added to the set of candidate Local Endpoints
1355	      for this connection.

1357	      Gathering locals is primarily an endpoint local operation,
1358	      although it might involve exchanges with a STUN server to derive
1359	      server reflexive locals, or with a TURN server or other relay to
1360	      derive relayed locals.  It does not involve communication with the
1361	      Remote Endpoint.

1363	   o  Gathering Remote Endpoint Candidates

1365	      The Remote Endpoint is typically a name that needs to be resolved
1366	      into a set of possible addresses that can be used for
1367	      communication.  Resolving the Remote Endpoint is the process of
1368	      recursively performing such name lookups, until fully resolved, to
1369	      return the set of candidates for the remote of this connection.

1371	      How this is done will depend on the type of the Remote Endpoint,
1372	      and can also be specific to each Local Endpoint.  A common case is
1373	      when the Remote Endpoint is a DNS name, in which case it is
1374	      resolved to give a set of IPv4 and IPv6 addresses representing
1375	      that name.  Some types of remote might require more complex
1376	      resolution.  Resolving the Remote Endpoint for a peer-to-peer
1377	      connection might involve communication with a rendezvous server,
1378	      which in turn contacts the peer to gain consent to communicate and
1379	      retrieve its set of candidate locals, which are returned and form
1380	      the candidate remote addresses for contacting that peer.

1382	      Resolving the remote is _not_ a local operation.  It will involve
1383	      a directory service, and can require communication with the remote
1384	      to rendezvous and exchange peer addresses.  This can expose some
1385	      or all of the candidate locals to the remote.

1387	   o  Establishing Connections

1389	      The set of candidate Local Endpoints and the set of candidate
1390	      Remote Endpoints are paired, to derive a priority ordered set of
1391	      Candidate Paths that can potentially be used to establish a
1392	      Connection.

1394	      Then, communication is attempted over each candidate path, in
1395	      priority order.  If there are multiple candidates with the same
1396	      priority, then connection establishment proceeds simultaneously
1397	      and uses the transient that wins the race to be established.
1398	      Otherwise, connection establishment is sequential, paced at a rate
1399	      that should not congest the network.  Depending on the chosen
1400	      transport, this phase might involve racing TCP connections to a
1401	      server over IPv4 and IPv6 [RFC8305], or it could involve a STUN
1402	      exchange to establish peer-to-peer UDP connectivity [RFC5245], or
1403	      some other means.

1405	   o  Confirming and Maintaining Connections

1407	      Once connectivity has been established, unused resources can be
1408	      released and the chosen path can be confirmed.  This is primarily
1409	      required when establishing peer-to-peer connectivity, where
1410	      connections supporting relayed locals that were not required can
1411	      be closed, and where an associated signalling operation might be
1412	      needed to inform middleboxes and proxies of the chosen path.
1413	      Keep-alive messages may also be sent, as appropriate, to ensure
1414	      NAT and firewall state is maintained, so the Connection remains
1415	      operational.

1417	   To support ICE, or similar protocols, that involve an out-of-band
1418	   indirect signalling exchange to exchange candidates with the Remote
1419	   Endpoint, it's important to be able to query the set of candidate
1420	   Local Endpoints, and give the protocol stack a set of candidate
1421	   Remote Endpoints, before it attempts to establish connections.

1423	   (TO-DO: It is expected that a single abstract algorithm can be
1424	   identified that supports both the peer-to-peer and client-server
1425	   connection racing, allowing this text to be merged with Section 4)

1427	11.  IANA Considerations

1429	   RFC-EDITOR: Please remove this section before publication.

1431	   This document has no actions for IANA.

1433	12.  Security Considerations

1435	12.1.  Considerations for Candidate Gathering

1437	   Implementations should avoid downgrade attacks that allow network
1438	   interference to cause the implementation to select less secure, or
1439	   entirely insecure, combinations of paths and protocols.

1441	12.2.  Considerations for Candidate Racing

1443	   See Section 5.2 for security considerations around racing with 0-RTT
1444	   data.

1446	   An attacker that knows a particular device is racing several options
1447	   during connection establishment may be able to block packets for the
1448	   first connection attempt, thus inducing the device to fall back to a
1449	   secondary attempt.  This is a problem if the secondary attempts have
1450	   worse security properties that enable further attacks.
1451	   Implementations should ensure that all options have equivalent
1452	   security properties to avoid incentivizing attacks.

1454	   Since results from the network can determine how a connection attempt
1455	   tree is built, such as when DNS returns a list of resolved endpoints,
1456	   it is possible for the network to cause an implementation to consume
1457	   significant on-device resources.  Implementations should limit the
1458	   maximum amount of state allowed for any given node, including the
1459	   number of child nodes, especially when the state is based on results
1460	   from the network.

1462	13.  Acknowledgements

1464	   This work has received funding from the European Union's Horizon 2020
1465	   research and innovation programme under grant agreement No. 644334
1466	   (NEAT).

1468	   This work has been supported by Leibniz Prize project funds of DFG -
1469	   German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ
1470	   FE 570/4-1).

1472	   This work has been supported by the UK Engineering and Physical
1473	   Sciences Research Council under grant EP/R04144X/1.

1475	   Thanks to Stuart Cheshire, Josh Graessley, David Schinazi, and Eric
1476	   Kinnear for their implementation and design efforts, including Happy
1477	   Eyeballs, that heavily influenced this work.

1479	14.  References

1481	14.1.  Normative References

1483	   [I-D.ietf-taps-minset]
1484	              Welzl, M. and S. Gjessing, "A Minimal Set of Transport
1485	              Services for TAPS Systems", draft-ietf-taps-minset-02
1486	              (work in progress), February 2018.

1488	   [I-D.pauly-taps-arch]
1489	              Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G.,
1490	              Perkins, C., Tiesel, P., and C. Wood, "An Architecture for
1491	              Transport Services", draft-pauly-taps-arch-00 (work in
1492	              progress), February 2018.

1494	   [I-D.trammell-taps-interface]
1495	              Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G.,
1496	              Kuehlewind, M., Perkins, C., Tiesel, P., and C. Wood, "An
1497	              Abstract Application Layer Interface to Transport
1498	              Services", draft-trammell-taps-interface-00 (work in
1499	              progress), March 2018.

1501	   [RFC6458]  Stewart, R., Tuexen, M., Poon, K., Lei, P., and V.
1502	              Yasevich, "Sockets API Extensions for the Stream Control
1503	              Transmission Protocol (SCTP)", RFC 6458,
1504	              DOI 10.17487/RFC6458, December 2011,
1505	              <https://www.rfc-editor.org/info/rfc6458>.

1507	   [RFC7413]  Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
1508	              Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
1509	              <https://www.rfc-editor.org/info/rfc7413>.

1511	   [RFC7540]  Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
1512	              Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
1513	              DOI 10.17487/RFC7540, May 2015,
1514	              <https://www.rfc-editor.org/info/rfc7540>.

1516	   [RFC8260]  Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann,
1517	              "Stream Schedulers and User Message Interleaving for the
1518	              Stream Control Transmission Protocol", RFC 8260,
1519	              DOI 10.17487/RFC8260, November 2017,
1520	              <https://www.rfc-editor.org/info/rfc8260>.

1522	   [RFC8303]  Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of
1523	              Transport Features Provided by IETF Transport Protocols",
1524	              RFC 8303, DOI 10.17487/RFC8303, February 2018,
1525	              <https://www.rfc-editor.org/info/rfc8303>.

1527	   [RFC8304]  Fairhurst, G. and T. Jones, "Transport Features of the
1528	              User Datagram Protocol (UDP) and Lightweight UDP (UDP-
1529	              Lite)", RFC 8304, DOI 10.17487/RFC8304, February 2018,
1530	              <https://www.rfc-editor.org/info/rfc8304>.

1532	   [RFC8305]  Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2:
1533	              Better Connectivity Using Concurrency", RFC 8305,
1534	              DOI 10.17487/RFC8305, December 2017,
1535	              <https://www.rfc-editor.org/info/rfc8305>.

1537	14.2.  Informative References

1539	   [I-D.ietf-quic-transport]
1540	              Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
1541	              and Secure Transport", draft-ietf-quic-transport-10 (work
1542	              in progress), March 2018.

1544	   [I-D.ietf-tls-tls13]
1545	              Rescorla, E., "The Transport Layer Security (TLS) Protocol
1546	              Version 1.3", draft-ietf-tls-tls13-26 (work in progress),
1547	              March 2018.

1549	   [NEAT-flow-mapping]
1550	              Weinrank, F. and M. Tuexen, "Transparent Flow Mapping for
1551				  NEAT (in Workshop on Future of Internet Transport (FIT
1552				  2017))", June 2017.

1554	   [RFC5245]  Rosenberg, J., "Interactive Connectivity Establishment
1555	              (ICE): A Protocol for Network Address Translator (NAT)
1556	              Traversal for Offer/Answer Protocols", RFC 5245,
1557	              DOI 10.17487/RFC5245, April 2010,
1558	              <https://www.rfc-editor.org/info/rfc5245>.

1560	   [Trickle]  Ghobadi, M., Cheng, Y., Jain, A. and M. Mathis, "Trickle -
1561	              Rate Limiting YouTube Video Streaming (ATC 2012)", June
1562				  2012.

1564	Appendix A.  Additional Properties

1566	   This appendix discusses implementation considerations for additional
1567	   parameters and properties that could be used to enhance transport
1568	   protocol and/or path selection, or the transmission of messages given
1569	   a Protocol Stack that implements them.  These are not part of the
1570	   interface, and may be removed from the final document, but are
1571	   presented here to support discussion within the TAPS working group as
1572	   to whether they should be added to a future revision of the base
1573	   specification.

1575	A.1.  Properties Affecting Sorting of Branches

1577	   In addition to the Protocol and Path Selection Properties discussed
1578	   in Section 4.3, the following properties under discussion can
1579	   influence branch sorting:

1581	   o  Size to be Sent or Received: An implementation may use the Size to
1582	      be Sent or Received in combination with cached performance
1583	      estimates, see Section 8.2, e.g. the observed Round Trip Time and
1584	      the observed maximum throughput, to compute an estimate of the
1585	      completion time of a transfer over different available paths.  It
1586	      may then prefer the path with the shorter expected completion
1587	      time.  This property may be used instead of the Capacity profile,
1588	      as the application does not always know whether its transfer will
1589	      be latency-bound or bandwidth-bound, and thus may not be able to
1590	      specify a Capacity Profile.  However, the application may know the
1591	      Size to be Sent or Received from metadata, e.g., in adaptive HTTP
1592	      streaming such as MPEG-DASH, or in operating system upgrades.  A
1593	      related paper is currently under submission.

1595	   o  Send / Receive Bitrate: If the application indicates an expected
1596	      send or receive bitrate, an implementation may prefer a path that
1597	      can likely provide the desired bandwidth, based on cached maximum
1598	      throughput, see Section 8.2.  The application may know the Send or
1599	      Receive Bitrate from metadata in adaptive HTTP streaming, such as
1600	      MPEG-DASH.

1602	   o  Cost Preferences: If the application indicates a preference to
1603	      avoid expensive paths, and some paths are associated with a
1604	      monetary cost, an implementation should decrease the ranking of
1605	      such paths.  If the application indicates that it prohibits using
1606	      expensive paths, paths that are associated with a cost should be
1607	      purged from the decision tree.

1609	A.2.  Send Parameters

1611	   In addition to the Send Parameters listed in Section 5.1.1.1, the
1612	   following Send Parameters are under discussion:

1614	   o  Send Bitrate: If an application indicates a certain bitrate it
1615	      wants to send on the connection, the implementation may limit the
1616	      bitrate of the outgoing communication to that rate, for example by
1617	      setting an upper bound for the TCP congestion window of a
1618	      connection calculated from the Send Bitrate and the Round Trip
1619	      Time.  This helps to avoid bursty traffic patterns on video
1620	      streaming servers, see [Trickle].

1622	Authors' Addresses

1624	   Anna Brunstrom (editor)
1625	   Karlstad University
1626	   Universitetsgatan 2
1627	   651 88 Karlstad
1628	   Sweden

1630	   Email: anna.brunstrom@kau.se

1632	   Tommy Pauly (editor)
1633	   Apple Inc.
1634	   One Apple Park Way
1635	   Cupertino, California 95014
1636	   United States of America

1638	   Email: tpauly@apple.com

1640	   Theresa Enghardt
1641	   TU Berlin
1642	   Marchstrasse 23
1643	   10587 Berlin
1644	   Germany

1646	   Email: theresa@inet.tu-berlin.de
1647	   Karl-Johan Grinnemo
1648	   Karlstad University
1649	   Universitetsgatan 2
1650	   651 88 Karlstad
1651	   Sweden

1653	   Email: karl-johan.grinnemo@kau.se

1655	   Tom Jones
1656	   University of Aberdeen
1657	   Fraser Noble Building
1658	   Aberdeen, AB24 3UE
1659	   UK

1661	   Email: tom@erg.abdn.ac.uk

1663	   Philipp S. Tiesel
1664	   TU Berlin
1665	   Marchstrasse 23
1666	   10587 Berlin
1667	   Germany

1669	   Email: philipp@inet.tu-berlin.de

1671	   Colin Perkins
1672	   University of Glasgow
1673	   School of Computing Science
1674	   Glasgow G12 8QQ
1675	   United Kingdom

1677	   Email: csp@csperkins.org

1679	   Michael Welzl
1680	   University of Oslo
1681	   PO Box 1080 Blindern
1682	   0316  Oslo
1683	   Norway

1685	   Email: michawe@ifi.uio.no