idnits 2.17.1 

draft-ietf-taps-impl-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 7 instances of too long lines in the document, the longest one
     being 41 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 1239: '... Implementations SHOULD ensure that th...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (13 July 2020) is 1380 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-19) exists of
     draft-ietf-taps-arch-07

  == Outdated reference: A later version (-26) exists of
     draft-ietf-taps-interface-06

  ** Obsolete normative reference: RFC 7540 (Obsoleted by RFC 9113)

  == Outdated reference: A later version (-34) exists of
     draft-ietf-quic-transport-29

  == Outdated reference: A later version (-11) exists of
     draft-ietf-tcpm-2140bis-05

  -- Obsolete informational reference (is this intentional?): RFC 5389
     (Obsoleted by RFC 8489)

  -- Obsolete informational reference (is this intentional?): RFC 5766
     (Obsoleted by RFC 8656)


     Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TAPS Working Group                                     A. Brunstrom, Ed.
3	Internet-Draft                                       Karlstad University
4	Intended status: Informational                             T. Pauly, Ed.
5	Expires: 14 January 2021                                      Apple Inc.
6	                                                             T. Enghardt
7	                                                                 Netflix
8	                                                           K-J. Grinnemo
9	                                                     Karlstad University
10	                                                                T. Jones
11	                                                  University of Aberdeen
12	                                                               P. Tiesel
13	                                                               TU Berlin
14	                                                              C. Perkins
15	                                                   University of Glasgow
16	                                                                M. Welzl
17	                                                      University of Oslo
18	                                                            13 July 2020

20	             Implementing Interfaces to Transport Services
21	                        draft-ietf-taps-impl-07

23	Abstract

25	   The Transport Services (TAPS) system enables applications to use
26	   transport protocols flexibly for network communication and defines a
27	   protocol-independent TAPS Application Programming Interface (API)
28	   that is based on an asynchronous, event-driven interaction pattern.
29	   This document serves as a guide to implementation on how to build
30	   such a system.

32	Status of This Memo

34	   This Internet-Draft is submitted in full conformance with the
35	   provisions of BCP 78 and BCP 79.

37	   Internet-Drafts are working documents of the Internet Engineering
38	   Task Force (IETF).  Note that other groups may also distribute
39	   working documents as Internet-Drafts.  The list of current Internet-
40	   Drafts is at https://datatracker.ietf.org/drafts/current/.

42	   Internet-Drafts are draft documents valid for a maximum of six months
43	   and may be updated, replaced, or obsoleted by other documents at any
44	   time.  It is inappropriate to use Internet-Drafts as reference
45	   material or to cite them other than as "work in progress."

47	   This Internet-Draft will expire on 14 January 2021.

49	Copyright Notice

51	   Copyright (c) 2020 IETF Trust and the persons identified as the
52	   document authors.  All rights reserved.

54	   This document is subject to BCP 78 and the IETF Trust's Legal
55	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
56	   license-info) in effect on the date of publication of this document.
57	   Please review these documents carefully, as they describe your rights
58	   and restrictions with respect to this document.  Code Components
59	   extracted from this document must include Simplified BSD License text
60	   as described in Section 4.e of the Trust Legal Provisions and are
61	   provided without warranty as described in the Simplified BSD License.

63	Table of Contents

65	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
66	   2.  Implementing Connection Objects . . . . . . . . . . . . . . .   4
67	   3.  Implementing Pre-Establishment  . . . . . . . . . . . . . . .   5
68	     3.1.  Configuration-time errors . . . . . . . . . . . . . . . .   5
69	     3.2.  Role of system policy . . . . . . . . . . . . . . . . . .   6
70	   4.  Implementing Connection Establishment . . . . . . . . . . . .   7
71	     4.1.  Candidate Gathering . . . . . . . . . . . . . . . . . . .   8
72	       4.1.1.  Gathering Endpoint Candidates . . . . . . . . . . . .   8
73	       4.1.2.  Structuring Options as a Tree . . . . . . . . . . . .   9
74	       4.1.3.  Branch Types  . . . . . . . . . . . . . . . . . . . .  11
75	       4.1.4.  Branching Order-of-Operations . . . . . . . . . . . .  13
76	       4.1.5.  Sorting Branches  . . . . . . . . . . . . . . . . . .  14
77	     4.2.  Candidate Racing  . . . . . . . . . . . . . . . . . . . .  16
78	       4.2.1.  Immediate . . . . . . . . . . . . . . . . . . . . . .  16
79	       4.2.2.  Delayed . . . . . . . . . . . . . . . . . . . . . . .  17
80	       4.2.3.  Failover  . . . . . . . . . . . . . . . . . . . . . .  17
81	     4.3.  Completing Establishment  . . . . . . . . . . . . . . . .  18
82	       4.3.1.  Determining Successful Establishment  . . . . . . . .  19
83	     4.4.  Establishing multiplexed connections  . . . . . . . . . .  19
84	     4.5.  Handling racing with "unconnected" protocols  . . . . . .  20
85	     4.6.  Implementing listeners  . . . . . . . . . . . . . . . . .  20
86	       4.6.1.  Implementing listeners for Connected Protocols  . . .  21
87	       4.6.2.  Implementing listeners for Unconnected Protocols  . .  21
88	       4.6.3.  Implementing listeners for Multiplexed Protocols  . .  21
89	   5.  Implementing Sending and Receiving Data . . . . . . . . . . .  21
90	     5.1.  Sending Messages  . . . . . . . . . . . . . . . . . . . .  22
91	       5.1.1.  Message Properties  . . . . . . . . . . . . . . . . .  22
92	       5.1.2.  Send Completion . . . . . . . . . . . . . . . . . . .  23
93	       5.1.3.  Batching Sends  . . . . . . . . . . . . . . . . . . .  24
94	     5.2.  Receiving Messages  . . . . . . . . . . . . . . . . . . .  24
95	     5.3.  Handling of data for fast-open protocols  . . . . . . . .  24
96	   6.  Implementing Message Framers  . . . . . . . . . . . . . . . .  25
97	     6.1.  Defining Message Framers  . . . . . . . . . . . . . . . .  26
98	     6.2.  Sender-side Message Framing . . . . . . . . . . . . . . .  27
99	     6.3.  Receiver-side Message Framing . . . . . . . . . . . . . .  27
100	   7.  Implementing Connection Management  . . . . . . . . . . . . .  28
101	     7.1.  Pooled Connection . . . . . . . . . . . . . . . . . . . .  29
102	     7.2.  Handling Path Changes . . . . . . . . . . . . . . . . . .  29
103	   8.  Implementing Connection Termination . . . . . . . . . . . . .  30
104	   9.  Cached State  . . . . . . . . . . . . . . . . . . . . . . . .  31
105	     9.1.  Protocol state caches . . . . . . . . . . . . . . . . . .  31
106	     9.2.  Performance caches  . . . . . . . . . . . . . . . . . . .  32
107	   10. Specific Transport Protocol Considerations  . . . . . . . . .  33
108	     10.1.  TCP  . . . . . . . . . . . . . . . . . . . . . . . . . .  34
109	     10.2.  UDP  . . . . . . . . . . . . . . . . . . . . . . . . . .  35
110	     10.3.  UDP Multicast Receive  . . . . . . . . . . . . . . . . .  37
111	     10.4.  TLS  . . . . . . . . . . . . . . . . . . . . . . . . . .  38
112	     10.5.  DTLS . . . . . . . . . . . . . . . . . . . . . . . . . .  40
113	     10.6.  HTTP . . . . . . . . . . . . . . . . . . . . . . . . . .  40
114	     10.7.  QUIC . . . . . . . . . . . . . . . . . . . . . . . . . .  41
115	     10.8.  HTTP/2 transport . . . . . . . . . . . . . . . . . . . .  42
116	     10.9.  SCTP . . . . . . . . . . . . . . . . . . . . . . . . . .  42
117	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  44
118	   12. Security Considerations . . . . . . . . . . . . . . . . . . .  45
119	     12.1.  Considerations for Candidate Gathering . . . . . . . . .  45
120	     12.2.  Considerations for Candidate Racing  . . . . . . . . . .  45
121	   13. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  45
122	   14. References  . . . . . . . . . . . . . . . . . . . . . . . . .  46
123	     14.1.  Normative References . . . . . . . . . . . . . . . . . .  46
124	     14.2.  Informative References . . . . . . . . . . . . . . . . .  47
125	   Appendix A.  Additional Properties  . . . . . . . . . . . . . . .  48
126	     A.1.  Properties Affecting Sorting of Branches  . . . . . . . .  48
127	   Appendix B.  Reasons for errors . . . . . . . . . . . . . . . . .  49
128	   Appendix C.  Existing Implementations . . . . . . . . . . . . . .  50
129	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  51

131	1.  Introduction

133	   The Transport Services architecture [I-D.ietf-taps-arch] defines a
134	   system that allows applications to use transport networking protocols
135	   flexibly.  The interface such a system exposes to applications is
136	   defined as the Transport Services API [I-D.ietf-taps-interface].
137	   This API is designed to be generic across multiple transport
138	   protocols and sets of protocols features.

140	   This document serves as a guide to implementation on how to build a
141	   system that provides a Transport Services API.  It is the job of an
142	   implementation of a Transport Services system to turn the requests of
143	   an application into decisions on how to establish connections, and
144	   how to transfer data over those connections once established.  The
145	   terminology used in this document is based on the Architecture
146	   [I-D.ietf-taps-arch].

148	2.  Implementing Connection Objects

150	   The connection objects that are exposed to applications for Transport
151	   Services are:

153	   *  the Preconnection, the bundle of Properties that describes the
154	      application constraints on the transport;

156	   *  the Connection, the basic object that represents a flow of data as
157	      Messages in either direction between the Local and Remote
158	      Endpoints;

160	   *  and the Listener, a passive waiting object that delivers new
161	      Connections.

163	   Preconnection objects should be implemented as bundles of properties
164	   that an application can both read and write.  Once a Preconnection
165	   has been used to create an outbound Connection or a Listener, the
166	   implementation should ensure that the copy of the properties held by
167	   the Connection or Listener is immutable.  This may involve performing
168	   a deep-copy if the application is still able to modify properties on
169	   the original Preconnection object.

171	   Connection objects represent the interface between the application
172	   and the implementation to manage transport state, and conduct data
173	   transfer.  During the process of establishment (Section 4), the
174	   Connection will be unbound to a specific transport flow, since there
175	   may be multiple candidate Protocol Stacks being raced.  Once the
176	   Connection is established, the object should be considered mapped to
177	   a specific Protocol Stack.  The notion of a Connection maps to many
178	   different protocols, depending on the Protocol Stack.  For example,
179	   the Connection may ultimately represent the interface into a TCP
180	   connection, a TLS session over TCP, a UDP flow with fully-specified
181	   local and remote endpoints, a DTLS session, a SCTP stream, a QUIC
182	   stream, or an HTTP/2 stream.

184	   Listener objects are created with a Preconnection, at which point
185	   their configuration should be considered immutable by the
186	   implementation.  The process of listening is described in
187	   Section 4.6.

189	3.  Implementing Pre-Establishment

191	   During pre-establishment the application specifies the Endpoints to
192	   be used for communication as well as its preferences via Selection
193	   Properties and, if desired, also Connection Properties.  Generally,
194	   Connection Properties should be configured as early as possible,
195	   because they can serve as input to decisions that are made by the
196	   implementation (e.g., the Capacity Profile can guide usage of a
197	   protocol offering scavenger-type congestion control).

199	   The implementation stores these properties as a part of the
200	   Preconnection object for use during connection establishment.  For
201	   Selection Properties that are not provided by the application, the
202	   implementation must use the default values specified in the Transport
203	   Services API ([I-D.ietf-taps-interface]).

205	3.1.  Configuration-time errors

207	   The transport system should have a list of supported protocols
208	   available, which each have transport features reflecting the
209	   capabilities of the protocol.  Once an application specifies its
210	   Transport Properties, the transport system matches the required and
211	   prohibited properties against the transport features of the available
212	   protocols.

214	   In the following cases, failure should be detected during pre-
215	   establishment:

217	   *  A request by an application for Protocol Properties that include
218	      requirements or prohibitions that cannot be satisfied by any of
219	      the available protocols.  For example, if an application requires
220	      "Configure Reliability per Message", but no such protocol is
221	      available on the host running the transport system this should
222	      result in an error, e.g., when SCTP is not supported by the
223	      operating system.

225	   *  A request by an application for Protocol Properties that are in
226	      conflict with each other, i.e., the required and prohibited
227	      properties cannot be satisfied by the same protocol.  For example,
228	      if an application prohibits "Reliable Data Transfer" but then
229	      requires "Configure Reliability per Message", this mismatch should
230	      result in an error.

232	   To avoid allocating resources, it is important that such cases fail
233	   as early as possible, e.g., to endpoint resolution, only to find out
234	   later that there is no protocol that satisfies the requirements.

236	3.2.  Role of system policy

238	   The properties specified during pre-establishment have a close
239	   relationship to system policy.  The implementation is responsible for
240	   combining and reconciling several different sources of preferences
241	   when establishing Connections.  These include, but are not limited
242	   to:

244	   1.  Application preferences, i.e., preferences specified during the
245	       pre-establishment via Selection Properties.

247	   2.  Dynamic system policy, i.e., policy compiled from internally and
248	       externally acquired information about available network
249	       interfaces, supported transport protocols, and current/previous
250	       Connections.  Examples of ways to externally retrieve policy-
251	       support information are through OS-specific statistics/
252	       measurement tools and tools that reside on middleboxes and
253	       routers.

255	   3.  Default implementation policy, i.e., predefined policy by OS or
256	       application.

258	   In general, any protocol or path used for a connection must conform
259	   to all three sources of constraints.  A violation of any of the
260	   layers should cause a protocol or path to be considered ineligible
261	   for use.  For an example of application preferences leading to
262	   constraints, an application may prohibit the use of metered network
263	   interfaces for a given Connection to avoid user cost.  Similarly, the
264	   system policy at a given time may prohibit the use of such a metered
265	   network interface from the application's process.  Lastly, the
266	   implementation itself may default to disallowing certain network
267	   interfaces unless explicitly requested by the application and allowed
268	   by the system.

270	   It is expected that the database of system policies and the method of
271	   looking up these policies will vary across various platforms.  An
272	   implementation should attempt to look up the relevant policies for
273	   the system in a dynamic way to make sure it is reflecting an accurate
274	   version of the system policy, since the system's policy regarding the
275	   application's traffic may change over time due to user or
276	   administrative changes.

278	4.  Implementing Connection Establishment

280	   The process of establishing a network connection begins when an
281	   application expresses intent to communicate with a remote endpoint by
282	   calling Initiate.  (At this point, any constraints or requirements
283	   the application may have on the connection are available from pre-
284	   establishment.)  The process can be considered complete once there is
285	   at least one Protocol Stack that has completed any required setup to
286	   the point that it can transmit and receive the application's data.

288	   Connection establishment is divided into two top-level steps:
289	   Candidate Gathering, to identify the paths, protocols, and endpoints
290	   to use, and Candidate Racing, in which the necessary protocol
291	   handshakes are conducted so that the transport system can select
292	   which set to use.  This document structures candidates for racing as
293	   a tree.

295	   The most simple example of this process might involve identifying the
296	   single IP address to which the implementation wishes to connect,
297	   using the system's current default interface or path, and starting a
298	   TCP handshake to establish a stream to the specified IP address.
299	   However, each step may also vary depending on the requirements of the
300	   connection: if the endpoint is defined as a hostname and port, then
301	   there may be multiple resolved addresses that are available; there
302	   may also be multiple interfaces or paths available, other than the
303	   default system interface; and some protocols may not need any
304	   transport handshake to be considered "established" (such as UDP),
305	   while other connections may utilize layered protocol handshakes, such
306	   as TLS over TCP.

308	   Whenever an implementation has multiple options for connection
309	   establishment, it can view the set of all individual connection
310	   establishment options as a single, aggregate connection
311	   establishment.  The aggregate set conceptually includes every valid
312	   combination of endpoints, paths, and protocols.  As an example,
313	   consider an implementation that initiates a TCP connection to a
314	   hostname + port endpoint, and has two valid interfaces available (Wi-
315	   Fi and LTE).  The hostname resolves to a single IPv4 address on the
316	   Wi-Fi network, and resolves to the same IPv4 address on the LTE
317	   network, as well as a single IPv6 address.  The aggregate set of
318	   connection establishment options can be viewed as follows:

320	Aggregate [Endpoint: www.example.com:80] [Interface: Any]   [Protocol: TCP]
321	|-> [Endpoint: 192.0.2.1:80]       [Interface: Wi-Fi] [Protocol: TCP]
322	|-> [Endpoint: 192.0.2.1:80]       [Interface: LTE]   [Protocol: TCP]
323	|-> [Endpoint: 2001:DB8::1.80]     [Interface: LTE]   [Protocol: TCP]
324	   Any one of these sub-entries on the aggregate connection attempt
325	   would satisfy the original application intent.  The concern of this
326	   section is the algorithm defining which of these options to try,
327	   when, and in what order.

329	   During Candidate Gathering, an implementation first excludes all
330	   protocols and paths that match a Prohibit or do not match all Require
331	   properties.  Then, the implementation will sort branches according to
332	   Preferred properties, Avoided properties, and possibly other
333	   criteria.

335	4.1.  Candidate Gathering

337	   The step of gathering candidates involves identifying which paths,
338	   protocols, and endpoints may be used for a given Connection.  This
339	   list is determined by the requirements, prohibitions, and preferences
340	   of the application as specified in the Selection Properties.

342	4.1.1.  Gathering Endpoint Candidates

344	   Both Local and Remote Endpoint Candidates must be discovered during
345	   connection establishment.  To support Interactive Connectivity
346	   Establishment (ICE) [RFC8445], or similar protocols, that involve
347	   out-of-band indirect signalling to exchange candidates with the
348	   Remote Endpoint, it's important to be able to query the set of
349	   candidate Local Endpoints, and give the protocol stack a set of
350	   candidate Remote Endpoints, before it attempts to establish
351	   connections.

353	4.1.1.1.  Local Endpoint candidates

355	   The set of possible Local Endpoints is gathered.  In the simple case,
356	   this merely enumerates the local interfaces and protocols, allocates
357	   ephemeral source ports.  For example, a system that has WiFi and
358	   Ethernet and supports IPv4 and IPv6 might gather four candidate
359	   locals (IPv4 on Ethernet, IPv6 on Ethernet, IPv4 on WiFi, and IPv6 on
360	   WiFi) that can form the source for a transient.

362	   If NAT traversal is required, the process of gathering Local
363	   Endpoints becomes broadly equivalent to the ICE candidate gathering
364	   phase (see Section 5.1.1. of [RFC8445]).  The endpoint determines its
365	   server reflexive Local Endpoints (i.e., the translated address of a
366	   local, on the other side of a NAT, e.g via a STUN sever [RFC5389])
367	   and relayed locals (e.g., via a TURN server [RFC5766] or other
368	   relay), for each interface and network protocol.  These are added to
369	   the set of candidate Local Endpoints for this connection.

371	   Gathering Local Endpoints is primarily a local operation, although it
372	   might involve exchanges with a STUN server to derive server reflexive
373	   locals, or with a TURN server or other relay to derive relayed
374	   locals.  However, it does not involve communication with the Remote
375	   Endpoint.

377	4.1.1.2.  Remote Endpoint Candidates

379	   The Remote Endpoint is typically a name that needs to be resolved
380	   into a set of possible addresses that can be used for communication.
381	   Resolving the Remote Endpoint is the process of recursively
382	   performing such name lookups, until fully resolved, to return the set
383	   of candidates for the remote of this connection.

385	   How this is done will depend on the type of the Remote Endpoint, and
386	   can also be specific to each Local Endpoint.  A common case is when
387	   the Remote Endpoint is a DNS name, in which case it is resolved to
388	   give a set of IPv4 and IPv6 addresses representing that name.  Some
389	   types of remote might require more complex resolution.  Resolving the
390	   Remote Endpoint for a peer-to-peer connection might involve
391	   communication with a rendezvous server, which in turn contacts the
392	   peer to gain consent to communicate and retrieve its set of candidate
393	   locals, which are returned and form the candidate remote addresses
394	   for contacting that peer.

396	   Resolving the remote is not a local operation.  It will involve a
397	   directory service, and can require communication with the remote to
398	   rendezvous and exchange peer addresses.  This can expose some or all
399	   of the candidate locals to the remote.

401	4.1.2.  Structuring Options as a Tree

403	   When an implementation responsible for connection establishment needs
404	   to consider multiple options, it should logically structure these
405	   options as a hierarchical tree.  Each leaf node of the tree
406	   represents a single, coherent connection attempt, with an Endpoint, a
407	   Path, and a set of protocols that can directly negotiate and send
408	   data on the network.  Each node in the tree that is not a leaf
409	   represents a connection attempt that is either underspecified, or
410	   else includes multiple distinct options.  For example, when
411	   connecting on an IP network, a connection attempt to a hostname and
412	   port is underspecified, because the connection attempt requires a
413	   resolved IP address as its remote endpoint.  In this case, the node
414	   represented by the connection attempt to the hostname is a parent
415	   node, with child nodes for each IP address.  Similarly, an
416	   implementation that is allowed to connect using multiple interfaces
417	   will have a parent node of the tree for the decision between the
418	   paths, with a branch for each interface.

420	   The example aggregate connection attempt above can be drawn as a tree
421	   by grouping the addresses resolved on the same interface into
422	   branches:

424	                             ||
425	                +==========================+
426	                |  www.example.com:80/Any  |
427	                +==========================+
428	                  //                    \\
429	+==========================+       +==========================+
430	| www.example.com:80/Wi-Fi |       |  www.example.com:80/LTE  |
431	+==========================+       +==========================+
432	             ||                      //                    \\
433	  +====================+  +====================+  +======================+
434	  | 192.0.2.1:80/Wi-Fi |  |  192.0.2.1:80/LTE  |  |  2001:DB8::1.80/LTE  |
435	  +====================+  +====================+  +======================+

437	   The rest of this section will use a notation scheme to represent this
438	   tree.  The parent (or trunk) node of the tree will be represented by
439	   a single integer, such as "1".  Each child of that node will have an
440	   integer that identifies it, from 1 to the number of children.  That
441	   child node will be uniquely identified by concatenating its integer
442	   to it's parents identifier with a dot in between, such as "1.1" and
443	   "1.2".  Each node will be summarized by a tuple of three elements:
444	   Endpoint, Path, and Protocol.  The above example can now be written
445	   more succinctly as:

447	   1 [www.example.com:80, Any, TCP]
448	     1.1 [www.example.com:80, Wi-Fi, TCP]
449	       1.1.1 [192.0.2.1:80, Wi-Fi, TCP]
450	     1.2 [www.example.com:80, LTE, TCP]
451	       1.2.1 [192.0.2.1:80, LTE, TCP]
452	       1.2.2 [2001:DB8::1.80, LTE, TCP]

454	   When an implementation views this aggregate set of connection
455	   attempts as a single connection establishment, it only will use one
456	   of the leaf nodes to transfer data.  Thus, when a single leaf node
457	   becomes ready to use, then the entire connection attempt is ready to
458	   use by the application.  Another way to represent this is that every
459	   leaf node updates the state of its parent node when it becomes ready,
460	   until the trunk node of the tree is ready, which then notifies the
461	   application that the connection as a whole is ready to use.

463	   A connection establishment tree may be degenerate, and only have a
464	   single leaf node, such as a connection attempt to an IP address over
465	   a single interface with a single protocol.

467	   1 [192.0.2.1:80, Wi-Fi, TCP]
468	   A parent node may also only have one child (or leaf) node, such as a
469	   when a hostname resolves to only a single IP address.

471	   1 [www.example.com:80, Wi-Fi, TCP]
472	     1.1 [192.0.2.1:80, Wi-Fi, TCP]

474	4.1.3.  Branch Types

476	   There are three types of branching from a parent node into one or
477	   more child nodes.  Any parent node of the tree must only use one type
478	   of branching.

480	4.1.3.1.  Derived Endpoints

482	   If a connection originally targets a single endpoint, there may be
483	   multiple endpoints of different types that can be derived from the
484	   original.  The connection library creates an ordered list of the
485	   derived endpoints according to application preference, system policy
486	   and expected performance.

488	   DNS hostname-to-address resolution is the most common method of
489	   endpoint derivation.  When trying to connect to a hostname endpoint
490	   on a traditional IP network, the implementation should send DNS
491	   queries for both A (IPv4) and AAAA (IPv6) records if both are
492	   supported on the local link.  The algorithm for ordering and racing
493	   these addresses should follow the recommendations in Happy Eyeballs
494	   [RFC8305].

496	   1 [www.example.com:80, Wi-Fi, TCP]
497	     1.1 [2001:DB8::1.80, Wi-Fi, TCP]
498	     1.2 [192.0.2.1:80, Wi-Fi, TCP]
499	     1.3 [2001:DB8::2.80, Wi-Fi, TCP]
500	     1.4 [2001:DB8::3.80, Wi-Fi, TCP]

502	   DNS-Based Service Discovery [RFC6763] can also provide an endpoint
503	   derivation step.  When trying to connect to a named service, the
504	   client may discover one or more hostname and port pairs on the local
505	   network using multicast DNS [RFC6762].  These hostnames should each
506	   be treated as a branch that can be attempted independently from other
507	   hostnames.  Each of these hostnames might resolve to one or more
508	   addresses, which would create multiple layers of branching.

510	   1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP]
511	     1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP]
512	       1.1.1 [31.133.160.18.631, Wi-Fi, TCP]

514	4.1.3.2.  Alternate Paths

516	   If a client has multiple network interfaces available to it, e.g., a
517	   mobile client with both Wi-Fi and Cellular connectivity, it can
518	   attempt a connection over any of the interfaces.  This represents a
519	   branch point in the connection establishment.  Similar to a derived
520	   endpoint, the interfaces should be ranked based on preference, system
521	   policy, and performance.  Attempts should be started on one
522	   interface, and then on other interfaces successively after delays
523	   based on expected round-trip-time or other available metrics.

525	   1 [192.0.2.1:80, Any, TCP]
526	     1.1 [192.0.2.1:80, Wi-Fi, TCP]
527	     1.2 [192.0.2.1:80, LTE, TCP]

529	   This same approach applies to any situation in which the client is
530	   aware of multiple links or views of the network.  Multiple Paths,
531	   each with a coherent set of addresses, routes, DNS server, and more,
532	   may share a single interface.  A path may also represent a virtual
533	   interface service such as a Virtual Private Network (VPN).

535	   The list of available paths should be constrained by any requirements
536	   or prohibitions the application sets, as well as system policy.

538	4.1.3.3.  Protocol Options

540	   Differences in possible protocol compositions and options can also
541	   provide a branching point in connection establishment.  This allows
542	   clients to be resilient to situations in which a certain protocol is
543	   not functioning on a server or network.

545	   This approach is commonly used for connections with optional proxy
546	   server configurations.  A single connection might have several
547	   options available: an HTTP-based proxy, a SOCKS-based proxy, or no
548	   proxy.  These options should be ranked and attempted in succession.

550	   1 [www.example.com:80, Any, HTTP/TCP]
551	     1.1 [192.0.2.8:80, Any, HTTP/HTTP Proxy/TCP]
552	     1.2 [192.0.2.7:10234, Any, HTTP/SOCKS/TCP]
553	     1.3 [www.example.com:80, Any, HTTP/TCP]
554	       1.3.1 [192.0.2.1:80, Any, HTTP/TCP]

556	   This approach also allows a client to attempt different sets of
557	   application and transport protocols that, when available, could
558	   provide preferable features.  For example, the protocol options could
559	   involve QUIC [I-D.ietf-quic-transport] over UDP on one branch, and
560	   HTTP/2 [RFC7540] over TLS over TCP on the other:

562	   1 [www.example.com:443, Any, Any HTTP]
563	     1.1 [www.example.com:443, Any, QUIC/UDP]
564	       1.1.1 [192.0.2.1:443, Any, QUIC/UDP]
565	     1.2 [www.example.com:443, Any, HTTP2/TLS/TCP]
566	       1.2.1 [192.0.2.1:443, Any, HTTP2/TLS/TCP]

568	   Another example is racing SCTP with TCP:

570	   1 [www.example.com:80, Any, Any Stream]
571	     1.1 [www.example.com:80, Any, SCTP]
572	       1.1.1 [192.0.2.1:80, Any, SCTP]
573	     1.2 [www.example.com:80, Any, TCP]
574	       1.2.1 [192.0.2.1:80, Any, TCP]

576	   Implementations that support racing protocols and protocol options
577	   should maintain a history of which protocols and protocol options
578	   successfully established, on a per-network and per-endpoint basis
579	   (see Section 9.2).  This information can influence future racing
580	   decisions to prioritize or prune branches.

582	4.1.4.  Branching Order-of-Operations

584	   Branch types must occur in a specific order relative to one another
585	   to avoid creating leaf nodes with invalid or incompatible settings.
586	   In the example above, it would be invalid to branch for derived
587	   endpoints (the DNS results for www.example.com) before branching
588	   between interface paths, since there are situations when the results
589	   will be different across networks due to private names or different
590	   supported IP versions.  Implementations must be careful to branch in
591	   an order that results in usable leaf nodes whenever there are
592	   multiple branch types that could be used from a single node.

594	   The order of operations for branching, where lower numbers are acted
595	   upon first, should be:

597	   1.  Alternate Paths

599	   2.  Protocol Options

601	   3.  Derived Endpoints

603	   Branching between paths is the first in the list because results
604	   across multiple interfaces are likely not related to one another:
605	   endpoint resolution may return different results, especially when
606	   using locally resolved host and service names, and which protocols
607	   are supported and preferred may differ across interfaces.  Thus, if
608	   multiple paths are attempted, the overall connection can be seen as a
609	   race between the available paths or interfaces.

611	   Protocol options are next checked in order.  Whether or not a set of
612	   protocol, or protocol-specific options, can successfully connect is
613	   generally not dependent on which specific IP address is used.
614	   Furthermore, the protocol stacks being attempted may influence or
615	   altogether change the endpoints being used.  Adding a proxy to a
616	   connection's branch will change the endpoint to the proxy's IP
617	   address or hostname.  Choosing an alternate protocol may also modify
618	   the ports that should be selected.

620	   Branching for derived endpoints is the final step, and may have
621	   multiple layers of derivation or resolution, such as DNS service
622	   resolution and DNS hostname resolution.

624	   For example, if the application has indicated both a preference for
625	   WiFi over LTE and for a feature only available in SCTP, branches will
626	   be first sorted accord to path selection, with WiFi at the top.
627	   Then, branches with SCTP will be sorted to the top within their
628	   subtree according to the properties influencing protocol selection.
629	   However, if the implementation has current cache information that
630	   SCTP is not available on the path over WiFi, there is no SCTP node in
631	   the WiFi subtree.  Here, the path over WiFi will be tried first, and,
632	   if connection establishment succeeds, TCP will be used.  So the
633	   Selection Property of preferring WiFi takes precedence over the
634	   Property that led to a preference for SCTP.

636	   1. [www.example.com:80, Any, Any Stream]
637	   1.1 [192.0.2.1:80, Wi-Fi, Any Stream]
638	   1.1.1 [192.0.2.1:80, Wi-Fi, TCP]
639	   1.2 [192.0.3.1:80, LTE, Any Stream]
640	   1.2.1 [192.0.3.1:80, LTE, SCTP]
641	   1.2.2 [192.0.3.1:80, LTE, TCP]

643	4.1.5.  Sorting Branches

645	   Implementations should sort the branches of the tree of connection
646	   options in order of their preference rank, from most preferred to
647	   least preferred.  Leaf nodes on branches with higher rankings
648	   represent connection attempts that will be raced first.
649	   Implementations should order the branches to reflect the preferences
650	   expressed by the application for its new connection, including
651	   Selection Properties, which are specified in
652	   [I-D.ietf-taps-interface].

654	   In addition to the properties provided by the application, an
655	   implementation may include additional criteria such as cached
656	   performance estimates, see Section 9.2, or system policy, see
657	   Section 3.2, in the ranking.  Two examples of how Selection and
658	   Connection Properties may be used to sort branches are provided
659	   below:

661	   *  "Interface Instance or Type": If the application specifies an
662	      interface type to be preferred or avoided, implementations should
663	      accordingly rank the paths.  If the application specifies an
664	      interface type to be required or prohibited, an implementation is
665	      expeceted to not include the non-conforming paths.

667	   *  "Capacity Profile": An implementation can use the Capacity Profile
668	      to prefer paths that match an application's expected traffic
669	      pattern.  This match will use cached performance estimates, see
670	      Section 9.2:

672	      -  Scavenger: Prefer paths with the highest expected available
673	         capacity, based on the observed maximum throughput;

675	      -  Low Latency/Interactive: Prefer paths with the lowest expected
676	         Round Trip Time, based on observed round trip time estimates;

678	      -  Constant-Rate Streaming: Prefer paths that can are expected to
679	         satisy the requested Stream Send or Stream Receive Bitrate,
680	         based on the observed maximum throughput.

682	   Implementations process the Properties in the following order:
683	   Prohibit, Require, Prefer, Avoid.  If Selection Properties contain
684	   any prohibited properties, the implementation should first purge
685	   branches containing nodes with these properties.  For required
686	   properties, it should only keep branches that satisfy these
687	   requirements.  Finally, it should order the branches according to the
688	   preferred properties, and finally use any avoided properties as a
689	   tiebreaker.  When ordering branches, an implementation can give more
690	   weight to properties that the application has explicitly set, than to
691	   the properties that are default.

693	   The available protocols and paths on a specific system and in a
694	   specific context can change; therefore, the result of sorting and the
695	   outcome of racing may vary, even when using the same Selection and
696	   Connection Properties.  However, an implementation ought to provide a
697	   consistent outcome to applications, e.g., by preferring protocols and
698	   paths that are already used by existing Connections that specified
699	   similar Properties.

701	4.2.  Candidate Racing

703	   The primary goal of the Candidate Racing process is to successfully
704	   negotiate a protocol stack to an endpoint over an interface--to
705	   connect a single leaf node of the tree--with as little delay and as
706	   few unnecessary connections attempts as possible.  Optimizing these
707	   two factors improves the user experience, while minimizing network
708	   load.

710	   This section covers the dynamic aspect of connection establishment.
711	   The tree described above is a useful conceptual and architectural
712	   model.  However, an implementation is unable to know the full tree
713	   before it is formed and many of the possible branches ultimately
714	   might not be used.

716	   There are three different approaches to racing the attempts for
717	   different nodes of the connection establishment tree:

719	   1.  Immediate

721	   2.  Delayed

723	   3.  Failover

725	   Each approach is appropriate in different use-cases and branch types.
726	   However, to avoid consuming unnecessary network resources,
727	   implementations should not use immediate racing as a default
728	   approach.

730	   The timing algorithms for racing should remain independent across
731	   branches of the tree.  Any timers or racing logic is isolated to a
732	   given parent node, and is not ordered precisely with regards to other
733	   children of other nodes.

735	4.2.1.  Immediate

737	   Immediate racing is when multiple alternate branches are started
738	   without waiting for any one branch to make progress before starting
739	   the next alternative.  This means the attempts are effectively
740	   simultaneous.  Immediate racing should be avoided by implementations,
741	   since it consumes extra network resources and establishes state that
742	   might not be used.

744	4.2.2.  Delayed

746	   Delayed racing can be used whenever a single node of the tree has
747	   multiple child nodes.  Based on the order determined when building
748	   the tree, the first child node will be initiated immediately,
749	   followed by the next child node after some delay.  Once that second
750	   child node is initiated, the third child node (if present) will begin
751	   after another delay, and so on until all child nodes have been
752	   initiated, or one of the child nodes successfully completes its
753	   negotiation.

755	   Delayed racing attempts occur in parallel.  Implementations should
756	   not terminate an earlier child connection attempt upon starting a
757	   secondary child.

759	   The delay between starting child nodes should be based on the
760	   properties of the previously started child node.  For example, if the
761	   first child represents an IP address with a known route, and the
762	   second child represents another IP address, the delay between
763	   starting the first and second IP addresses can be based on the
764	   expected retransmission cadence for the first child's connection
765	   (derived from historical round-trip-time).  Alternatively, if the
766	   first child represents a branch on a Wi-Fi interface, and the second
767	   child represents a branch on an LTE interface, the delay should be
768	   based on the expected time in which the branch for the first
769	   interface would be able to establish a connection, based on link
770	   quality and historical round-trip-time.

772	   Any delay should have a defined minimum and maximum value based on
773	   the branch type.  Generally, branches between paths and protocols
774	   should have longer delays than branches between derived endpoints.
775	   The maximum delay should be considered with regards to how long a
776	   user is expected to wait for the connection to complete.

778	   If a child node fails to connect before the delay timer has fired for
779	   the next child, the next child should be started immediately.

781	4.2.3.  Failover

783	   If an implementation or application has a strong preference for one
784	   branch over another, the branching node may choose to wait until one
785	   child has failed before starting the next.  Failure of a leaf node is
786	   determined by its protocol negotiation failing or timing out; failure
787	   of a parent branching node is determined by all of its children
788	   failing.

790	   An example in which failover is recommended is a race between a
791	   protocol stack that uses a proxy and a protocol stack that bypasses
792	   the proxy.  Failover is useful in case the proxy is down or
793	   misconfigured, but any more aggressive type of racing may end up
794	   unnecessarily avoiding a proxy that was preferred by policy.

796	4.3.  Completing Establishment

798	   The process of connection establishment completes when one leaf node
799	   of the tree has completed negotiation with the remote endpoint
800	   successfully, or else all nodes of the tree have failed to connect.
801	   The first leaf node to complete its connection is then used by the
802	   application to send and receive data.

804	   Successes and failures of a given attempt should be reported up to
805	   parent nodes (towards the trunk of the tree).  For example, in the
806	   following case, if 1.1.1 fails to connect, it reports the failure to
807	   1.1.  Since 1.1 has no other child nodes, it also has failed and
808	   reports that failure to 1.  Because 1.2 has not yet failed, 1 is not
809	   considered to have failed.  Since 1.2 has not yet started, it is
810	   started and the process continues.  Similarly, if 1.1.1 successfully
811	   connects, then it marks 1.1 as connected, which propagates to the
812	   trunk node 1.  At this point, the connection as a whole is considered
813	   to be successfully connected and ready to process application data

815	   1 [www.example.com:80, Any, TCP]
816	     1.1 [www.example.com:80, Wi-Fi, TCP]
817	       1.1.1 [192.0.2.1:80, Wi-Fi, TCP]
818	     1.2 [www.example.com:80, LTE, TCP]
819	   ...

821	   If a leaf node has successfully completed its connection, all other
822	   attempts should be made ineligible for use by the application for the
823	   original request.  New connection attempts that involve transmitting
824	   data on the network ought not to be started after another leaf node
825	   has already successfully completed, because the connection as a whole
826	   has now been established.  An implementation may choose to let
827	   certain handshakes and negotiations complete in order to gather
828	   metrics to influence future connections.  Keeping additional
829	   connections is generally not recommended since those attempts were
830	   slower to connect and may exhibit less desirable properties.

832	4.3.1.  Determining Successful Establishment

834	   Implementations may select the criteria by which a leaf node is
835	   considered to be successfully connected differently on a per-protocol
836	   basis.  If the only protocol being used is a transport protocol with
837	   a clear handshake, like TCP, then the obvious choice is to declare
838	   that node "connected" when the last packet of the three-way handshake
839	   has been received.  If the only protocol being used is an
840	   "unconnected" protocol, like UDP, the implementation may consider the
841	   node fully "connected" the moment it determines a route is present,
842	   before sending any packets on the network, see further Section 4.5.

844	   For protocol stacks with multiple handshakes, the decision becomes
845	   more nuanced.  If the protocol stack involves both TLS and TCP, an
846	   implementation could determine that a leaf node is connected after
847	   the TCP handshake is complete, or it can wait for the TLS handshake
848	   to complete as well.  The benefit of declaring completion when the
849	   TCP handshake finishes, and thus stopping the race for other branches
850	   of the tree, is that there will be less burden on the network from
851	   other connection attempts.  On the other hand, by waiting until the
852	   TLS handshake is complete, an implementation avoids the scenario in
853	   which a TCP handshake completes quickly, but TLS negotiation is
854	   either very slow or fails altogether in particular network conditions
855	   or to a particular endpoint.  To avoid the issue of TLS possibly
856	   failing, the implementation should not generate a Ready event for the
857	   Connection until TLS is established.

859	   If all of the leaf nodes fail to connect during racing, i.e. none of
860	   the configurations that satisfy all requirements given in the
861	   Transport Properties actually work over the available paths, then the
862	   transport system should notify the application with an InitiateError
863	   event.  An InitiateError event should also be generated in case the
864	   transport system finds no usable candidates to race.

866	4.4.  Establishing multiplexed connections

868	   Multiplexing several Connections over a single underlying transport
869	   connection requires that the Connections to be multiplexed belong to
870	   the same Connection Group (as is indicated by the application using
871	   the Clone call).  When the underlying transport connection supports
872	   multi-streaming, the Transport System can map each Connection in the
873	   Connection Group to a different stream.  Thus, when the Connections
874	   that are offered to an application by the Transport System are
875	   multiplexed, the Transport System may implement the establishment of
876	   a new Connection by simply beginning to use a new stream of an
877	   already established transport connection and there is no need for a
878	   connection establishment procedure.  This, then, also means that
879	   there may not be any "establishment" message (like a TCP SYN), but
880	   the application can simply start sending or receiving.  Therefore,
881	   when the Initiate action of a Transport System is called without
882	   Messages being handed over, it cannot be guaranteed that the other
883	   endpoint will have any way to know about this, and hence a passive
884	   endpoint's ConnectionReceived event may not be called upon an active
885	   endpoint's Inititate.  Instead, calling the ConnectionReceived event
886	   may be delayed until the first Message arrives.

888	4.5.  Handling racing with "unconnected" protocols

890	   While protocols that use an explicit handshake to validate a
891	   Connection to a peer can be used for racing multiple establishment
892	   attempts in parallel, "unconnected" protocols such as raw UDP do not
893	   offer a way to validate the presence of a peer or the usability of a
894	   Connection without application feedback.  An implementation should
895	   consider such a protocol stack to be established as soon as a local
896	   route to the peer endpoint is confirmed.

898	   However, if a peer is not reachable over the network using the
899	   unconnected protocol, or data cannot be exchanged for any other
900	   reason, the application may want to attempt using another candidate
901	   Protocol Stack.  The implementation should maintain the list of other
902	   candidate Protocol Stacks that were eligible to use.

904	4.6.  Implementing listeners

906	   When an implementation is asked to Listen, it registers with the
907	   system to wait for incoming traffic to the Local Endpoint.  If no
908	   Local Endpoint is specified, the implementation should use an
909	   ephemeral port.

911	   If the Selection Properties do not require a single network interface
912	   or path, but allow the use of multiple paths, the Listener object
913	   should register for incoming traffic on all of the network interfaces
914	   or paths that conform to the Properties.  The set of available paths
915	   can change over time, so the implementation should monitor network
916	   path changes and register and de-register the Listener across all
917	   usable paths.  When using multiple paths, the Listener is generally
918	   expected to use the same port for listening on each.

920	   If the Selection Properties allow multiple protocols to be used for
921	   listening, and the implementation supports it, the Listener object
922	   should support receiving inbound connections for each eligible
923	   protocol on each eligible path.

925	4.6.1.  Implementing listeners for Connected Protocols

927	   Connected protocols such as TCP and TLS-over-TCP have a strong
928	   mapping between the Local and Remote Endpoints (five-tuple) and their
929	   protocol connection state.  These map into Connection objects.
930	   Whenever a new inbound handshake is being started, the Listener
931	   should generate a new Connection object and pass it to the
932	   application.

934	4.6.2.  Implementing listeners for Unconnected Protocols

936	   Unconnected protocols such as UDP and UDP-lite generally do not
937	   provide the same mechanisms that connected protocols do to offer
938	   Connection objects.  Implementations should wait for incoming packets
939	   for unconnected protocols on a listening port and should perform
940	   five-tuple matching of packets to either existing Connection objects
941	   or the creation of new Connection objects.  On platforms with
942	   facilities to create a "virtual connection" for unconnected protocols
943	   implementations should use these mechanisms to minimise the handling
944	   of datagrams intended for already created Connection objects.

946	4.6.3.  Implementing listeners for Multiplexed Protocols

948	   Protocols that provide multiplexing of streams into a single five-
949	   tuple can listen both for entirely new connections (a new HTTP/2
950	   stream on a new TCP connection, for example) and for new sub-
951	   connections (a new HTTP/2 stream on an existing connection).  If the
952	   abstraction of Connection presented to the application is mapped to
953	   the multiplexed stream, then the Listener should deliver new
954	   Connection objects in the same way for either case.  The
955	   implementation should allow the application to introspect the
956	   Connection Group marked on the Connections to determine the grouping
957	   of the multiplexing.

959	5.  Implementing Sending and Receiving Data

961	   The most basic mapping for sending a Message is an abstraction of
962	   datagrams, in which the transport protocol naturally deals in
963	   discrete packets.  Each Message here corresponds to a single
964	   datagram.  Generally, these will be short enough that sending and
965	   receiving will always use a complete Message.

967	   For protocols that expose byte-streams, the only delineation provided
968	   by the protocol is the end of the stream in a given direction.  Each
969	   Message in this case corresponds to the entire stream of bytes in a
970	   direction.  These Messages may be quite long, in which case they can
971	   be sent in multiple parts.

973	   Protocols that provide the framing (such as length-value protocols,
974	   or protocols that use delimiters) provide data boundaries that may be
975	   longer than a traditional packet datagram.  Each Message for framing
976	   protocols corresponds to a single frame, which may be sent either as
977	   a complete Message, or in multiple parts.

979	5.1.  Sending Messages

981	   The effect of the application sending a Message is determined by the
982	   top-level protocol in the established Protocol Stack.  That is, if
983	   the top-level protocol provides an abstraction of framed messages
984	   over a connection, the receiving application will be able to obtain
985	   multiple Messages on that connection, even if the framing protocol is
986	   built on a byte-stream protocol like TCP.

988	5.1.1.  Message Properties

990	   *  Lifetime: this should be implemented by removing the Message from
991	      the queue of pending Messages after the Lifetime has expired.  A
992	      queue of pending Messages within the transport system
993	      implementation that have yet to be handed to the Protocol Stack
994	      can always support this property, but once a Message has been sent
995	      into the send buffer of a protocol, only certain protocols may
996	      support removing a message.  For example, an implementation cannot
997	      remove bytes from a TCP send buffer, while it can remove data from
998	      a SCTP send buffer using the partial reliability extension
999	      [RFC8303].  When there is no standing queue of Messages within the
1000	      system, and the Protocol Stack does not support the removal of a
1001	      Message from the stack's send buffer, this property may be
1002	      ignored.

1004	   *  Priority: this represents the ability to prioritize a Message over
1005	      other Messages.  This can be implemented by the system re-ordering
1006	      Messages that have yet to be handed to the Protocol Stack, or by
1007	      giving relative priority hints to protocols that support
1008	      priorities per Message.  For example, an implementation of HTTP/2
1009	      could choose to send Messages of different Priority on streams of
1010	      different priority.

1012	   *  Ordered: when this is false, this disables the requirement of in-
1013	      order-delivery for protocols that support configurable ordering.

1015	   *  Safely Replayable: when this is true, this means that the Message
1016	      can be used by mechanisms that might transfer it multiple times -
1017	      e.g., as a result of racing multiple transports or as part of TCP
1018	      Fast Open.  Also, protocols that do not protect against duplicated
1019	      messages, such as UDP, can only be used with Messages that are
1020	      Safely Replayable.

1022	   *  Final: when this is true, this means that a transport connection
1023	      can be closed immediately after transmission of the message.

1025	   *  Corruption Protection Length: when this is set to any value other
1026	      than "Full Coverage", it sets the minimum protection in protocols
1027	      that allow limiting the checksum length (e.g.  UDP-Lite).

1029	   *  Reliable Data Transfer (Message): When true, the property
1030	      specifies that the Message must be reliably transmitted.  When
1031	      false, and if unreliable transmission is supported by the
1032	      underlying protocol, then the Message should be unreliably
1033	      transmitted.  If the underlying protocol does not support
1034	      unreliable transmission, the Message should be reliably
1035	      transmitted.

1037	   *  Message Capacity Profile Override: When true, this expresses a
1038	      wish to override the Generic Connection Property "Capacity
1039	      Profile" for this Message.  Depending on the value, this can, for
1040	      example, be implemented by changing the DSCP value of the
1041	      associated packet (note that the he guidelines in Section 6 of
1042	      [RFC7657] apply; e.g., the DSCP value should not be changed for
1043	      different packets within a reliable transport protocol session or
1044	      DCCP connection).

1046	   *  No Fragmentation: When set, this property limits the message size
1047	      to the Maximum Message Size Before Fragmentation or Segmentation
1048	      (see Section 10.1.7 of [I-D.ietf-taps-interface]).  Messages
1049	      larger than this size generate an error.  Setting this avoids
1050	      transport-layer segmentation or network-layer fragmentation.  When
1051	      used with transports running over IP version 4 the Don't Fragment
1052	      bit will be set to avoid on-path IP fragmentation ([RFC8304]).

1054	5.1.2.  Send Completion

1056	   The application should be notified whenever a Message or partial
1057	   Message has been consumed by the Protocol Stack, or has failed to
1058	   send.  The meaning of the Message being consumed by the stack may
1059	   vary depending on the protocol.  For a basic datagram protocol like
1060	   UDP, this may correspond to the time when the packet is sent into the
1061	   interface driver.  For a protocol that buffers data in queues, like
1062	   TCP, this may correspond to when the data has entered the send
1063	   buffer.

1065	5.1.3.  Batching Sends

1067	   Since sending a Message may involve a context switch between the
1068	   application and the transport system, sending patterns that involve
1069	   multiple small Messages can incur high overhead if each needs to be
1070	   enqueued separately.  To avoid this, the application can indicate a
1071	   batch of Send actions through the API.  When this is used, the
1072	   implementation should hold off on processing Messages until the batch
1073	   is complete.

1075	5.2.  Receiving Messages

1077	   Similar to sending, Receiving a Message is determined by the top-
1078	   level protocol in the established Protocol Stack.  The main
1079	   difference with Receiving is that the size and boundaries of the
1080	   Message are not known beforehand.  The application can communicate in
1081	   its Receive action the parameters for the Message, which can help the
1082	   implementation know how much data to deliver and when.  For example,
1083	   if the application only wants to receive a complete Message, the
1084	   implementation should wait until an entire Message (datagram, stream,
1085	   or frame) is read before delivering any Message content to the
1086	   application.  This requires the implementation to understand where
1087	   messages end, either via a supplied deframer or because the top-level
1088	   protocol in the established Protocol Stack preserves message
1089	   boundaries.  If the top-level protocol only supports a byte-stream
1090	   and no framers were supported, the application can control the flow
1091	   of received data by specifying the minimum number of bytes of Message
1092	   content it wants to receive at one time.

1094	   If a Connection becomes finished before a requested Receive action
1095	   can be satisfied, the implementation should deliver any partial
1096	   Message content outstanding, or if none is available, an indication
1097	   that there will be no more received Messages.

1099	5.3.  Handling of data for fast-open protocols

1101	   Several protocols allow sending higher-level protocol or application
1102	   data within the first packet of their protocol establishment, such as
1103	   TCP Fast Open [RFC7413] and TLS 1.3 [RFC8446].  This approach is
1104	   referred to as sending Zero-RTT (0-RTT) data.  This is a desirable
1105	   property, but poses challenges to an implementation that uses racing
1106	   during connection establishment.

1108	   If the application has 0-RTT data to send in any protocol handshakes,
1109	   it needs to provide this data before the handshakes have begun.  When
1110	   racing, this means that the data should be provided before the
1111	   process of connection establishment has begun.  If the application
1112	   wants to send 0-RTT data, it must indicate this to the implementation
1113	   by setting the "Safely Replayable" send parameter to true when
1114	   sending the data.  In general, 0-RTT data may be replayed (for
1115	   example, if a TCP SYN contains data, and the SYN is retransmitted,
1116	   the data will be retransmitted as well but may be considered as a new
1117	   connection instead of a retransmission).  Also, when racing
1118	   connections, different leaf nodes have the opportunity to send the
1119	   same data independently.  If data is truly safely replayable, this
1120	   should be permissible.

1122	   Once the application has provided its 0-RTT data, an implementation
1123	   should keep a copy of this data and provide it to each new leaf node
1124	   that is started and for which a 0-RTT protocol is being used.

1126	   It is also possible that protocol stacks within a particular leaf
1127	   node use 0-RTT handshakes without any safely replayable application
1128	   data.  For example, TCP Fast Open could use a Client Hello from TLS
1129	   as its 0-RTT data, shortening the cumulative handshake time.

1131	   0-RTT handshakes often rely on previous state, such as TCP Fast Open
1132	   cookies, previously established TLS tickets, or out-of-band
1133	   distributed pre-shared keys (PSKs).  Implementations should be aware
1134	   of security concerns around using these tokens across multiple
1135	   addresses or paths when racing.  In the case of TLS, any given ticket
1136	   or PSK should only be used on one leaf node, since servers will
1137	   likely reject duplicate tickets in order to prevent replays (see
1138	   section-8.1 [RFC8446]).  If implementations have multiple tickets
1139	   available from a previous connection, each leaf node attempt can use
1140	   a different ticket.  In effect, each leaf node will send the same
1141	   early application data, yet encoded (encrypted) differently on the
1142	   wire.

1144	6.  Implementing Message Framers

1146	   Message Framers are pieces of code that define simple transformations
1147	   between application Message data and raw transport protocol data.  A
1148	   Framer can encapsulate or encode outbound Messages, and decapsulate
1149	   or decode inbound data into Messages.

1151	   While many protocols can be represented as Message Framers, for the
1152	   purposes of the Transport Services interface these are ways for
1153	   applications or application frameworks to define their own Message
1154	   parsing to be included within a Connection's Protocol Stack.  As an
1155	   example, TLS can serve the purpose of framing data over TCP, but is
1156	   exposed as a protocol natively supported by the Transport Services
1157	   interface.

1159	   Most Message Framers fall into one of two categories:

1161	   *  Header-prefixed record formats, such as a basic Type-Length-Value
1162	      (TLV) structure

1164	   *  Delimiter-separated formats, such as HTTP/1.1.

1166	   Common Message Framers can be provided by the Transport Services
1167	   implementation, but an implementation ought to allow custom Message
1168	   Framers to be defined by the application or some other piece of
1169	   software.  This section describes one possible interface for defining
1170	   Message Framers as an example.

1172	6.1.  Defining Message Framers

1174	   A Message Framer is primarily defined by the set of code that handles
1175	   events for a framer implementation, specifically how it handles
1176	   inbound and outbound data parsing.  The piece of code that implements
1177	   custom framing logic will be referred to as the "framer
1178	   implementation", which may be provided by the Transport Services
1179	   implementation or the application itself.  The Message Framer refers
1180	   to the object or piece of code within the main Connection
1181	   implementation that delivers events to the custom framer
1182	   implementation whenever data is ready to be parsed or framed.

1184	   When a Connection establishment attempt begins, an event can be
1185	   delivered to notify the framer implementation that a new Connection
1186	   is being created.  Similarly, a stop event can be delivered when a
1187	   Connection is being torn down.  The framer implementation can use the
1188	   Connection object to look up specific properties of the Connection or
1189	   the network being used that may influence how to frame Messages.

1191	   MessageFramer -> Start(Connection)
1192	   MessageFramer -> Stop(Connection)

1194	   When a Message Framer generates a "Start" event, the framer
1195	   implementation has the opportunity to start writing some data prior
1196	   to the Connection delivering its "Ready" event.  This allows the
1197	   implementation to communicate control data to the remote endpoint
1198	   that can be used to parse Messages.

1200	   MessageFramer.MakeConnectionReady(Connection)

1202	   Similarly, when a Message Framer generates a "Stop" event, the framer
1203	   implementation has the opportunity to write some final data or clear
1204	   up its local state before the "Closed" event is delivered to the
1205	   Application.  The framer implementation can indicate that it has
1206	   finished with this.

1208	   MessageFramer.MakeConnectionClosed(Connection)
1209	   At any time if the implementation encounters a fatal error, it can
1210	   also cause the Connection to fail and provide an error.

1212	   MessageFramer.FailConnection(Connection, Error)

1214	   Should the framer implementation deem the candidate selected during
1215	   racing unsuitable it can signal this by failing the Connection prior
1216	   to marking it as ready.  If there are no other candidates available,
1217	   the Connection will fail.  Otherwise, the Connection will select a
1218	   different candidate and the Message Framer will generate a new
1219	   "Start" event.

1221	   Before an implementation marks a Message Framer as ready, it can also
1222	   dynamically add a protocol or framer above it in the stack.  This
1223	   allows protocols like STARTTLS, that need to add TLS conditionally,
1224	   to modify the Protocol Stack based on a handshake result.

1226	   otherFramer := NewMessageFramer()
1227	   MessageFramer.PrependFramer(Connection, otherFramer)

1229	6.2.  Sender-side Message Framing

1231	   Message Framers generate an event whenever a Connection sends a new
1232	   Message.

1234	MessageFramer -> NewSentMessage<Connection, MessageData, MessageContext, IsEndOfMessage>

1236	   Upon receiving this event, a framer implementation is responsible for
1237	   performing any necessary transformations and sending the resulting
1238	   data back to the Message Framer, which will in turn send it to the
1239	   next protocol.  Implementations SHOULD ensure that there is a way to
1240	   pass the original data through without copying to improve
1241	   performance.

1243	   MessageFramer.Send(Connection, Data)

1245	   To provide an example, a simple protocol that adds a length as a
1246	   header would receive the "NewSentMessage" event, create a data
1247	   representation of the length of the Message data, and then send a
1248	   block of data that is the concatenation of the length header and the
1249	   original Message data.

1251	6.3.  Receiver-side Message Framing

1253	   In order to parse a received flow of data into Messages, the Message
1254	   Framer notifies the framer implementation whenever new data is
1255	   available to parse.

1257	   MessageFramer -> HandleReceivedData<Connection>

1259	   Upon receiving this event, the framer implementation can inspect the
1260	   inbound data.  The data is parsed from a particular cursor
1261	   representing the unprocessed data.  The application requests a
1262	   specific amount of data it needs to have available in order to parse.
1263	   If the data is not available, the parse fails.

1265	MessageFramer.Parse(Connection, MinimumIncompleteLength, MaximumLength) -> (Data, MessageContext, IsEndOfMessage)

1267	   The framer implementation can directly advance the receive cursor
1268	   once it has parsed data to effectively discard data (for example,
1269	   discard a header once the content has been parsed).

1271	   To deliver a Message to the application, the framer implementation
1272	   can either directly deliver data that it has allocated, or deliver a
1273	   range of data directly from the underlying transport and
1274	   simultaneously advance the receive cursor.

1276	MessageFramer.AdvanceReceiveCursor(Connection, Length)
1277	MessageFramer.DeliverAndAdvanceReceiveCursor(Connection, MessageContext, Length, IsEndOfMessage)
1278	MessageFramer.Deliver(Connection, MessageContext, Data, IsEndOfMessage)

1280	   Note that "MessageFramer.DeliverAndAdvanceReceiveCursor" allows the
1281	   framer implementation to earmark bytes as part of a Message even
1282	   before they are received by the transport.  This allows the delivery
1283	   of very large Messages without requiring the implementation to
1284	   directly inspect all of the bytes.

1286	   To provide an example, a simple protocol that parses a length as a
1287	   header value would receive the "HandleReceivedData" event, and call
1288	   "Parse" with a minimum and maximum set to the length of the header
1289	   field.  Once the parse succeeded, it would call
1290	   "AdvanceReceiveCursor" with the length of the header field, and then
1291	   call "DeliverAndAdvanceReceiveCursor" with the length of the body
1292	   that was parsed from the header, marking the new Message as complete.

1294	7.  Implementing Connection Management

1296	   Once a Connection is established, the Transport Services system
1297	   allows applications to interact with the Connection by modifying or
1298	   inspecting Connection Properties.  A Connection can also generate
1299	   events in the form of Soft Errors.

1301	   The set of Connection Properties that are supported for setting and
1302	   getting on a Connection are described in [I-D.ietf-taps-interface].
1303	   For any properties that are generic, and thus could apply to all
1304	   protocols being used by a Connection, the Transport System should
1305	   store the properties in a generic storage, and notify all protocol
1306	   instances in the Protocol Stack whenever the properties have been
1307	   modified by the application.  For protocol-specfic properties, such
1308	   as the User Timeout that applies to TCP, the Transport System only
1309	   needs to update the relevant protocol instance.

1311	   If an error is encountered in setting a property (for example, if the
1312	   application tries to set a TCP-specific property on a Connection that
1313	   is not using TCP), the action should fail gracefully.  The
1314	   application may be informed of the error, but the Connection itself
1315	   should not be terminated.

1317	   The Transport Services implementation should allow protocol instances
1318	   in the Protocol Stack to pass up arbitrary generic or protocol-
1319	   specific errors that can be delivered to the application as Soft
1320	   Errors.  These allow the application to be informed of ICMP errors,
1321	   and other similar events.

1323	7.1.  Pooled Connection

1325	   For protocols that employ request/response pairs and do not require
1326	   in-order delivery of the responses, like HTTP, the transport
1327	   implementation may distribute interactions across several underlying
1328	   transport connections.  For these kinds of protocols, implementations
1329	   may hide the connection management and only expose a single
1330	   Connection object and the individual requests/responses as messages.
1331	   These Pooled Connections can use multiple connections or multiple
1332	   streams of multi-streaming connections between endpoints, as long as
1333	   all of these satisfy the requirements, and prohibitions specified in
1334	   the Selection Properties of the Pooled Connection.  This enables
1335	   implementations to realize transparent connection coalescing,
1336	   connection migration, and to perform per-message endpoint and path
1337	   selection by choosing among these underlying connections.

1339	7.2.  Handling Path Changes

1341	   When a path change occurs, the Transport Services implementation is
1342	   responsible for notifying Protocol Instances in the Protocol Stack.
1343	   If the Protocol Stack includes a transport protocol that supports
1344	   multipath connectivity, an update to the available paths should
1345	   inform the Protocol Instance of the new set of paths that are
1346	   permissible based on the Selection Properties passed by the
1347	   application.  A multipath protocol can establish new subflows over
1348	   new paths, and should tear down subflows over paths that are no
1349	   longer available.  Pooled Connections Section 7.1 may add or remove
1350	   underlying transport connections in a similar manner.  If the
1351	   Protocol Stack includes a transport protocol that does not support
1352	   multipath, but support migrating between paths, the update to
1353	   available paths can be used as the trigger to migrating the
1354	   connection.  For protocols that do not support multipath or
1355	   migration, the Protocol Instances may be informed of the path change,
1356	   but should not be forcibly disconnected if the previously used path
1357	   becomes unavailable.  An exception to this case is if the System
1358	   Policy changes to prohibit traffic from the Connection based on its
1359	   properties, in which case the Protocol Stack should be disconnected.

1361	8.  Implementing Connection Termination

1363	   With TCP, when an application closes a connection, this means that it
1364	   has no more data to send (but expects all data that has been handed
1365	   over to be reliably delivered).  However, with TCP only, "close" does
1366	   not mean that the application will stop receiving data.  This is
1367	   related to TCP's ability to support half-closed connections.

1369	   SCTP is an example of a protocol that does not support such half-
1370	   closed connections.  Hence, with SCTP, the meaning of "close" is
1371	   stricter: an application has no more data to send (but expects all
1372	   data that has been handed over to be reliably delivered), and will
1373	   also not receive any more data.

1375	   Implementing a protocol independent transport system means that the
1376	   exposed semantics must be the strictest subset of the semantics of
1377	   all supported protocols.  Hence, as is common with all reliable
1378	   transport protocols, after a Close action, the application can expect
1379	   to have its reliability requirements honored regarding the data it
1380	   has given to the Transport System, but it cannot expect to be able to
1381	   read any more data after calling Close.

1383	   Abort differs from Close only in that no guarantees are given
1384	   regarding data that the application has handed over to the Transport
1385	   System before calling Abort.

1387	   As explained in Section 4.4, when a new stream is multiplexed on an
1388	   already existing connection of a Transport Protocol Instance, there
1389	   is no need for a connection establishment procedure.  Because the
1390	   Connections that are offered by the Transport System can be
1391	   implemented as streams that are multiplexed on a transport protocol's
1392	   connection, it can therefore not be guaranteed that one Endpoint's
1393	   Initiate action provokes a ConnectionReceived event at its peer.

1395	   For Close (provoking a Finished event) and Abort (provoking a
1396	   ConnectionError event), the same logic applies: while it is desirable
1397	   to be informed when a peer closes or aborts a Connection, whether
1398	   this is possible depends on the underlying protocol, and no
1399	   guarantees can be given.  With SCTP, the transport system can use the
1400	   stream reset procedure to cause a Finish event upon a Close action
1401	   from the peer [NEAT-flow-mapping].

1403	9.  Cached State

1405	   Beyond a single Connection's lifetime, it is useful for an
1406	   implementation to keep state and history.  This cached state can help
1407	   improve future Connection establishment due to re-using results and
1408	   credentials, and favoring paths and protocols that performed well in
1409	   the past.

1411	   Cached state may be associated with different Endpoints for the same
1412	   Connection, depending on the protocol generating the cached content.
1413	   For example, session tickets for TLS are associated with specific
1414	   endpoints, and thus should be cached based on a Connection's hostname
1415	   Endpoint (if applicable).  On the other hand, performance
1416	   characteristics of a path are more likely tied to the IP address and
1417	   subnet being used.

1419	9.1.  Protocol state caches

1421	   Some protocols will have long-term state to be cached in association
1422	   with Endpoints.  This state often has some time after which it is
1423	   expired, so the implementation should allow each protocol to specify
1424	   an expiration for cached content.

1426	   Examples of cached protocol state include:

1428	   *  The DNS protocol can cache resolution answers (A and AAAA queries,
1429	      for example), associated with a Time To Live (TTL) to be used for
1430	      future hostname resolutions without requiring asking the DNS
1431	      resolver again.

1433	   *  TLS caches session state and tickets based on a hostname, which
1434	      can be used for resuming sessions with a server.

1436	   *  TCP can cache cookies for use in TCP Fast Open.

1438	   Cached protocol state is primarily used during Connection
1439	   establishment for a single Protocol Stack, but may be used to
1440	   influence an implementation's preference between several candidate
1441	   Protocol Stacks.  For example, if two IP address Endpoints are
1442	   otherwise equally preferred, an implementation may choose to attempt
1443	   a connection to an address for which it has a TCP Fast Open cookie.

1445	   Applications must have a way to flush protocol cache state if
1446	   desired.  This may be necessary, for example, if application-layer
1447	   identifiers rotate and clients wish to avoid linkability via
1448	   trackable TLS tickets or TFO cookies.

1450	9.2.  Performance caches

1452	   In addition to protocol state, Protocol Instances should provide data
1453	   into a performance-oriented cache to help guide future protocol and
1454	   path selection.  Some performance information can be gathered
1455	   generically across several protocols to allow predictive comparisons
1456	   between protocols on given paths:

1458	   *  Observed Round Trip Time

1460	   *  Connection Establishment latency

1462	   *  Connection Establishment success rate

1464	   These items can be cached on a per-address and per-subnet
1465	   granularity, and averaged between different values.  The information
1466	   should be cached on a per-network basis, since it is expected that
1467	   different network attachments will have different performance
1468	   characteristics.  Besides Protocol Instances, other system entities
1469	   may also provide data into performance-oriented caches.  This could
1470	   for instance be signal strength information reported by radio modems
1471	   like Wi-Fi and mobile broadband or information about the battery-
1472	   level of the device.  Furthermore, the system may cache the observed
1473	   maximum throughput on a path as an estimate of the available
1474	   bandwidth.

1476	   An implementation should use this information, when possible, to
1477	   determine preference between candidate paths, endpoints, and protocol
1478	   options.  Eligible options that historically had significantly better
1479	   performance than others should be selected first when gathering
1480	   candidates (see Section 4.1) to ensure better performance for the
1481	   application.

1483	   The reasonable lifetime for cached performance values will vary
1484	   depending on the nature of the value.  Certain information, like the
1485	   connection establishment success rate to a Remote Endpoint using a
1486	   given protocol stack, can be stored for a long period of time (hours
1487	   or longer), since it is expected that the capabilities of the Remote
1488	   Endpoint are not changing very quickly.  On the other hand, the Round
1489	   Trip Time observed by TCP over a particular network path may vary
1490	   over a relatively short time interval.  For such values, the
1491	   implementation should remove them from the cache more quickly, or
1492	   treat older values with less confidence/weight.

1494	   [I-D.ietf-tcpm-2140bis] provides guidance about sharing of TCP
1495	   Control Block information between connections on initialization.

1497	10.  Specific Transport Protocol Considerations

1499	   Each protocol that can run as part of a Transport Services
1500	   implementation defines both its API mapping as well as implementation
1501	   details.  API mappings for a protocol apply most to Connections in
1502	   which the given protocol is the "top" of the Protocol Stack.  For
1503	   example, the mapping of the "Send" function for TCP applies to
1504	   Connections in which the application directly sends over TCP.  If
1505	   HTTP/2 is used on top of TCP, the HTTP/2 mappings take precendence.

1507	   Each protocol has a notion of Connectedness.  Possible values for
1508	   Connectedness are:

1510	   *  Unconnected.  Unconnected protocols do not establish explicit
1511	      state between endpoints, and do not perform a handshake during
1512	      Connection establishment.

1514	   *  Connected.  Connected protocols establish state between endpoints,
1515	      and perform a handshake during Connection establishment.  The
1516	      handshake may be 0-RTT to send data or resume a session, but
1517	      bidirectional traffic is required to confirm connectedness.

1519	   *  Multiplexing Connected.  Multiplexing Connected protocols share
1520	      properties with Connected protocols, but also explictly support
1521	      opening multiple application-level flows.  This means that they
1522	      can support cloning new Connection objects without a new explicit
1523	      handshake.

1525	   Protocols also define a notion of Data Unit.  Possible values for
1526	   Data Unit are:

1528	   *  Byte-stream.  Byte-stream protocols do not define any Message
1529	      boundaries of their own apart from the end of a stream in each
1530	      direction.

1532	   *  Datagram.  Datagram protocols define Message boundaries at the
1533	      same level of transmission, such that only complete (not partial)
1534	      Messages are supported.

1536	   *  Message.  Message protocols support Message boundaries that can be
1537	      sent and received either as complete or partial Messages.  Maximum
1538	      Message lengths can be defined, and Messages can be partially
1539	      reliable.

1541	   Below, terms in capitals with a dot (e.g., "CONNECT.SCTP") refer to
1542	   the primitives with the same name in section 4 of [RFC8303].  For
1543	   further implementation details, the description of these primitives
1544	   in [RFC8303] points to section 3 of [RFC8303] and section 3 of
1545	   [RFC8304], which refers back to the relevant specifications for each
1546	   protocol.  This back-tracking method applies to all elements of
1547	   [I-D.ietf-taps-minset] (see appendix D of [I-D.ietf-taps-interface]):
1548	   they are listed in appendix A of [I-D.ietf-taps-minset] with an
1549	   implementation hint in the same style, pointing back to section 4 of
1550	   [RFC8303].

1552	10.1.  TCP

1554	   Connectedness: Connected

1556	   Data Unit: Byte-stream

1558	   API mappings for TCP are as follows:

1560	   Connection Object:  TCP connections between two hosts map directly to
1561	      Connection objects.

1563	   Initiate:  CONNECT.TCP.  Calling "Initiate" on a TCP Connection
1564	      causes it to reserve a local port, and send a SYN to the Remote
1565	      Endpoint.

1567	   InitiateWithSend:  CONNECT.TCP with parameter "user message".  Early
1568	      safely replayable data is sent on a TCP Connection in the SYN, as
1569	      TCP Fast Open data.

1571	   Ready:  A TCP Connection is ready once the three-way handshake is
1572	      complete.

1574	   InitiateError:  Failure of CONNECT.TCP.  TCP can throw various errors
1575	      during connection setup.  Specifically, it is important to handle
1576	      a RST being sent by the peer during the handshake.

1578	   ConnectionError:  Once established, TCP throws errors whenever the
1579	      connection is disconnected, such as due to receiving a RST from
1580	      the peer; or hitting a TCP retransmission timeout.

1582	   Listen:  LISTEN.TCP.  Calling "Listen" for TCP binds a local port and
1583	      prepares it to receive inbound SYN packets from peers.

1585	   ConnectionReceived:  TCP Listeners will deliver new connections once
1586	      they have replied to an inbound SYN with a SYN-ACK.

1588	   Clone:  Calling "Clone" on a TCP Connection creates a new Connection
1589	      with equivalent parameters.  The two Connections are otherwise
1590	      independent.

1592	   Send:  SEND.TCP.  TCP does not on its own preserve Message
1593	      boundaries.  Calling "Send" on a TCP connection lays out the bytes
1594	      on the TCP send stream without any other delineation.  Any Message
1595	      marked as Final will cause TCP to send a FIN once the Message has
1596	      been completely written, by calling CLOSE.TCP immediately upon
1597	      successful termination of SEND.TCP.

1599	   Receive:  With RECEIVE.TCP, TCP delivers a stream of bytes without
1600	      any Message delineation.  All data delivered in the "Received" or
1601	      "ReceivedPartial" event will be part of a single stream-wide
1602	      Message that is marked Final (unless a Message Framer is used).
1603	      EndOfMessage will be delivered when the TCP Connection has
1604	      received a FIN (CLOSE-EVENT.TCP or ABORT-EVENT.TCP) from the peer.

1606	   Close:  Calling "Close" on a TCP Connection indicates that the
1607	      Connection should be gracefully closed (CLOSE.TCP) by sending a
1608	      FIN to the peer and waiting for a FIN-ACK before delivering the
1609	      "Closed" event.

1611	   Abort:  Calling "Abort" on a TCP Connection indicates that the
1612	      Connection should be immediately closed by sending a RST to the
1613	      peer (ABORT.TCP).

1615	10.2.  UDP

1617	   Connectedness: Unconnected

1619	   Data Unit: Datagram

1621	   API mappings for UDP are as follows:

1623	   Connection Object:  UDP connections represent a pair of specific IP
1624	      addresses and ports on two hosts.

1626	   Initiate:  CONNECT.UDP.  Calling "Initiate" on a UDP Connection
1627	      causes it to reserve a local port, but does not generate any
1628	      traffic.

1630	   InitiateWithSend:  Early data on a UDP Connection does not have any
1631	      special meaning.  The data is sent whenever the Connection is
1632	      Ready.

1634	   Ready:  A UDP Connection is ready once the system has reserved a
1635	      local port and has a path to send to the Remote Endpoint.

1637	   InitiateError:  UDP Connections can only generate errors on
1638	      initiation due to port conflicts on the local system.

1640	   ConnectionError:  Once in use, UDP throws "soft errors" (ERROR.UDP(-
1641	      Lite)) upon receiving ICMP notifications indicating failures in
1642	      the network.

1644	   Listen:  LISTEN.UDP.  Calling "Listen" for UDP binds a local port and
1645	      prepares it to receive inbound UDP datagrams from peers.

1647	   ConnectionReceived:  UDP Listeners will deliver new connections once
1648	      they have received traffic from a new Remote Endpoint.

1650	   Clone:  Calling "Clone" on a UDP Connection creates a new Connection
1651	      with equivalent parameters.  The two Connections are otherwise
1652	      independent.

1654	   Send:  SEND.UDP(-Lite).  Calling "Send" on a UDP connection sends the
1655	      data as the payload of a complete UDP datagram.  Marking Messages
1656	      as Final does not change anything in the datagram's contents.
1657	      Upon sending a UDP datagram, some relevant fields and flags in the
1658	      IP header can be controlled: DSCP (SET_DSCP.UDP(-Lite)), DF in
1659	      IPv4 (SET_DF.UDP(-Lite)) and ECN flag (SET_ECN.UDP(-Lite)).

1661	   Receive:  RECEIVE.UDP(-Lite).  UDP only delivers complete Messages to
1662	      "Received", each of which represents a single datagram received in
1663	      a UDP packet.  Upon receiving a UDP datagram, the ECN flag from
1664	      the IP header can be obtained (GET_ECN.UDP(-Lite)).

1666	   Close:  Calling "Close" on a UDP Connection (ABORT.UDP(-Lite))
1667	      releases the local port reservation.

1669	   Abort:  Calling "Abort" on a UDP Connection (ABORT.UDP(-Lite)) is
1670	      identical to calling "Close".

1672	10.3.  UDP Multicast Receive

1674	   Connectedness: Unconnected

1676	   Data Unit: Datagram

1678	   API mappings for Receiving Multicast UDP are as follows:

1680	   Connection Object:  Established UDP Multicast Receive connections
1681	      represent a pair of specific IP addresses and ports.  The
1682	      "unidirectional receive" transport property is required, and the
1683	      local endpoint must be configured with a group IP address and a
1684	      port.

1686	   Initiate:  Calling "Initiate" on a UDP Multicast Receive Connection
1687	      causes an immediate InitiateError.  This is an unsupported
1688	      operation.

1690	   InitiateWithSend:  Calling "InitiateWithSend" on a UDP Multicast
1691	      Receive Connection causes an immediate InitiateError.  This is an
1692	      unsupported operation.

1694	   Ready:  A UDP Multicast Receive Connection is ready once the system
1695	      has received traffic for the appropriate group and port.

1697	   InitiateError:  UDP Multicast Receive Connections generate an
1698	      InitiateError if Initiate is called.

1700	   ConnectionError:  Once in use, UDP throws "soft errors" (ERROR.UDP(-
1701	      Lite)) upon receiving ICMP notifications indicating failures in
1702	      the network.

1704	   Listen:  LISTEN.UDP.  Calling "Listen" for UDP Multicast Receive
1705	      binds a local port, prepares it to receive inbound UDP datagrams
1706	      from peers, and issues a multicast host join.  If a remote
1707	      endpoint with an address is supplied, the join is Source-specific
1708	      Multicast, and the path selection is based on the route to the
1709	      remote endpoint.  If a remote endpoint is not supplied, the join
1710	      is Any-source Multicast, and the path selection is based on the
1711	      outbound route to the group supplied in the local endpoint.

1713	   ConnectionReceived:  UDP Multicast Receive Listeners will deliver new
1714	      connections once they have received traffic from a new Remote
1715	      Endpoint.

1717	   Clone:  Calling "Clone" on a UDP Multicast Receive Connection creates
1718	      a new Connection with equivalent parameters.  The two Connections
1719	      are otherwise independent.

1721	   Send:  SEND.UDP(-Lite).  Calling "Send" on a UDP Multicast Receive
1722	      connection causes an immediate SendError.  This is an unsupported
1723	      operation.

1725	   Receive:  RECEIVE.UDP(-Lite).  The Receive operation in a UDP
1726	      Multicast Receive connection only delivers complete Messages to
1727	      "Received", each of which represents a single datagram received in
1728	      a UDP packet.  Upon receiving a UDP datagram, the ECN flag from
1729	      the IP header can be obtained (GET_ECN.UDP(-Lite)).

1731	   Close:  Calling "Close" on a UDP Multicast Receive Connection
1732	      (ABORT.UDP(-Lite)) releases the local port reservation and leaves
1733	      the group.

1735	   Abort:  Calling "Abort" on a UDP Multicast Receive Connection
1736	      (ABORT.UDP(-Lite)) is identical to calling "Close".

1738	10.4.  TLS

1740	   The mapping of a TLS stream abstraction into the application is
1741	   equivalent to the contract provided by TCP (see Section 10.1), and
1742	   builds upon many of the actions of TCP connections.

1744	   Connectedness: Connected

1746	   Data Unit: Byte-stream

1748	   Connection Object:  Connection objects represent a single TLS
1749	      connection running over a TCP connection between two hosts.

1751	   Initiate:  Calling "Initiate" on a TLS Connection causes it to first
1752	      initiate a TCP connection.  Once the TCP protocol is Ready, the
1753	      TLS handshake will be performed as a client (starting by sending a
1754	      "client_hello", and so on).

1756	   InitiateWithSend:  Early safely replayable data is supported by TLS
1757	      1.3, and sends encrypted application data in the first TLS message
1758	      when performing session resumption.  For older versions of TLS, or
1759	      if a session is not being resumed, the initial data will be
1760	      delayed until the TLS handshake is complete.  TCP Fast Open can
1761	      also be enabled automatically.

1763	   Ready:  A TLS Connection is ready once the underlying TCP connection
1764	      is Ready, and TLS handshake is also complete and keys have been
1765	      established to encrypt application data.

1767	   InitiateError:  In addition to TCP initiation errors, TLS can
1768	      generate errors during its handshake.  Examples of error include a
1769	      failure of the peer to successfully authenticate, the peer
1770	      rejecting the local authentication, or a failure to match versions
1771	      or algorithms.

1773	   ConnectionError:  TLS connections will generate TCP errors, or errors
1774	      due to failures to rekey or decrypt received messages.

1776	   Listen:  Calling "Listen" for TLS listens on TCP, and sets up
1777	      received connections to perform server-side TLS handshakes.

1779	   ConnectionReceived:  TLS Listeners will deliver new connections once
1780	      they have successfully completed both TCP and TLS handshakes.

1782	   Clone:  As with TCP, calling "Clone" on a TLS Connection creates a
1783	      new Connection with equivalent parameters.  The two Connections
1784	      are otherwise independent.

1786	   Send:  Like TCP, TLS does not preserve message boundaries.  Although
1787	      application data is framed natively in TLS, there is not a general
1788	      guarantee that these TLS messages represent semantically
1789	      meaningful application stream boundaries.  Rather, sending data on
1790	      a TLS Connection only guarantees that the application data will be
1791	      transmitted in an encrypted form.  Marking Messages as Final
1792	      causes a "close_notify" to be generated once the data has been
1793	      written.

1795	   Receive:  Like TCP, TLS delivers a stream of bytes without any
1796	      Message delineation.  The data is decrypted prior to being
1797	      delivered to the application.  If a "close_notify" is received,
1798	      the stream-wide Message will be delivered with EndOfMessage set.

1800	   Close:  Calling "Close" on a TLS Connection indicates that the
1801	      Connection should be gracefully closed by sending a "close_notify"
1802	      to the peer and waiting for a corresponding "close_notify" before
1803	      delivering the "Closed" event.

1805	   Abort:  Calling "Abort" on a TCP Connection indicates that the
1806	      Connection should be immediately closed by sending a
1807	      "close_notify", optionally preceded by "user_canceled", to the
1808	      peer.  Implementations do not need to wait to receive
1809	      "close_notify" before delivering the "Closed" event.

1811	10.5.  DTLS

1813	   DTLS follows the same behavior as TLS (Section 10.4), with the
1814	   notable exception of not inheriting behavior directly from TCP.
1815	   Differences from TLS are detailed below, and all cases not explicitly
1816	   mentioned should be considered the same as TLS.

1818	   Connectedness: Connected

1820	   Data Unit: Datagram

1822	   Connection Object:  Connection objects represent a single DTLS
1823	      connection running over a set of UDP ports between two hosts.

1825	   Initiate:  Calling "Initiate" on a DTLS Connection causes it reserve
1826	      a UDP local port, and begin sending handshake messages to the peer
1827	      over UDP.  These messages are reliable, and will be automatically
1828	      retransmitted.

1830	   Ready:  A DTLS Connection is ready once the TLS handshake is complete
1831	      and keys have been established to encrypt application data.

1833	   Send:  Sending over DTLS does preserve message boundaries in the same
1834	      way that UDP datagrams do.  Marking a Message as Final does send a
1835	      "close_notify" like TLS.

1837	   Receive:  Receiving over DTLS delivers one decrypted Message for each
1838	      received DTLS datagram.  If a "close_notify" is received, a
1839	      Message will be delivered that is marked as Final.

1841	10.6.  HTTP

1843	   HTTP requests and responses map naturally into Messages, since they
1844	   are delineated chunks of data with metadata that can be sent over a
1845	   transport.  To that end, HTTP can be seen as the most prevalent
1846	   framing protocol that runs on top of streams like TCP, TLS, etc.

1848	   In order to use a transport Connection that provides HTTP Message
1849	   support, the establishment and closing of the connection can be
1850	   treated as it would without the framing protocol.  Sending and
1851	   receiving of Messages, however, changes to treat each Message as a
1852	   well-delineated HTTP request or response, with the content of the
1853	   Message representing the body, and the Headers being provided in
1854	   Message metadata.

1856	   Connectedness: Multiplexing Connected

1858	   Data Unit: Message
1859	   Connection Object:  Connection objects represent a flow of HTTP
1860	      messages between a client and a server, which may be an HTTP/1.1
1861	      connection over TCP, or a single stream in an HTTP/2 connection.

1863	   Initiate:  Calling "Initiate" on an HTTP connection intiates a TCP or
1864	      TLS connection as a client.

1866	   Clone:  Calling "Clone" on an HTTP Connection opens a new stream on
1867	      an existing HTTP/2 connection when possible.  If the underlying
1868	      version does not support multiplexed streams, calling "Clone"
1869	      simply creates a new parallel connection.

1871	   Send:  When an application sends an HTTP Message, it is expected to
1872	      provide HTTP header values as a MessageContext in a canonical
1873	      form, along with any associated HTTP message body as the Message
1874	      data.  The HTTP header values are encoded in the specific version
1875	      format upon sending.

1877	   Receive:  HTTP Connections deliver Messages in which HTTP header
1878	      values attached to MessageContexts, and HTTP bodies in Message
1879	      data.

1881	   Close:  Calling "Close" on an HTTP Connection will only close the
1882	      underlying TLS or TCP connection if the HTTP version does not
1883	      support multiplexing.  For HTTP/2, for example, closing the
1884	      connection only closes a specific stream.

1886	10.7.  QUIC

1888	   QUIC provides a multi-streaming interface to an encrypted transport.
1889	   Each stream can be viewed as equivalent to a TLS stream over TCP, so
1890	   a natural mapping is to present each QUIC stream as an individual
1891	   Connection.  The protocol for the stream will be considered Ready
1892	   whenever the underlying QUIC connection is established to the point
1893	   that this stream's data can be sent.  For streams after the first
1894	   stream, this will likely be an immediate operation.

1896	   Closing a single QUIC stream, presented to the application as a
1897	   Connection, does not imply closing the underlying QUIC connection
1898	   itself.  Rather, the implementation may choose to close the QUIC
1899	   connection once all streams have been closed (often after some
1900	   timeout), or after an individual stream Connection sends an Abort.

1902	   Connectedness: Multiplexing Connected

1904	   Data Unit: Stream

1906	   Connection Object:  Connection objects represent a single QUIC stream
1907	      on a QUIC connection.

1909	10.8.  HTTP/2 transport

1911	   Similar to QUIC (Section 10.7), HTTP/2 provides a multi-streaming
1912	   interface.  This will generally use HTTP as the unit of Messages over
1913	   the streams, in which each stream can be represented as a transport
1914	   Connection.  The lifetime of streams and the HTTP/2 connection should
1915	   be managed as described for QUIC.

1917	   It is possible to treat each HTTP/2 stream as a raw byte-stream
1918	   instead of a carrier for HTTP messages, in which case the Messages
1919	   over the streams can be represented similarly to the TCP stream (one
1920	   Message per direction, see Section 10.1).

1922	   Connectedness: Multiplexing Connected

1924	   Data Unit: Stream

1926	   Connection Object:  Connection objects represent a single HTTP/2
1927	      stream on a HTTP/2 connection.

1929	10.9.  SCTP

1931	   Connectedness: Connected

1933	   Data Unit: Message

1935	   API mappings for SCTP are as follows:

1937	   Connection Object:  Connection objects represent a flow of SCTP
1938	      messages between a client and a server, which may be an SCTP
1939	      association or a stream in a SCTP association.  How to map
1940	      Connection objects to streams is described in [NEAT-flow-mapping];
1941	      in the following, a similar method is described.  To map
1942	      Connection objects to SCTP streams without head-of-line blocking
1943	      on the sender side, both the sending and receiving SCTP
1944	      implementation must support message interleaving [RFC8260].  Both
1945	      SCTP implementations must also support stream reconfiguration.
1946	      Finally, both communicating endpoints must be aware of this
1947	      intended multiplexing; [NEAT-flow-mapping] describes a way for a
1948	      Transport System to negotiate the stream mapping capability using
1949	      SCTP's adaptation layer indication, such that this functionality
1950	      would only take effect if both ends sides are aware of it.  The
1951	      first flow, for which the SCTP association has been created, will
1952	      always use stream id zero.  All additional flows are assigned to
1953	      unused stream ids in growing order.  To avoid a conflict when both
1954	      endpoints map new flows simultaneously, the peer which initiated
1955	      the transport connection will use even stream numbers whereas the
1956	      remote side will map its flows to odd stream numbers.  Both sides
1957	      maintain a status map of the assigned stream numbers.  Generally,
1958	      new streams must consume the lowest available (even or odd,
1959	      depending on the side) stream number; this rule is relevant when
1960	      lower numbers become available because Connection objects
1961	      associated to the streams are closed.

1963	   Initiate:  If this is the only Connection object that is assigned to
1964	      the SCTP association or stream mapping has not been negotiated,
1965	      CONNECT.SCTP is called.  Else, a new stream is used: if there are
1966	      enough streams available, "Initiate" is just a local operation
1967	      that assigns a new stream number to the Connection object.  The
1968	      number of streams is negotiated as a parameter of the prior
1969	      CONNECT.SCTP call, and it represents a trade-off between local
1970	      resource usage and the number of Connection objects that can be
1971	      mapped without requiring a reconfiguration signal.  When running
1972	      out of streams, ADD_STREAM.SCTP must be called.

1974	   InitiateWithSend:  If this is the only Connection object that is
1975	      assigned to the SCTP association or stream mapping has not been
1976	      negotiated, CONNECT.SCTP is called with the "user message"
1977	      parameter.  Else, a new stream is used (see "Initiate" for how to
1978	      handle running out of streams), and this just sends the first
1979	      message on a new stream.

1981	   Ready:  "Initiate" or "InitiateWithSend" returns without an error,
1982	      i.e. SCTP's four-way handshake has completed.  If an association
1983	      with the peer already exists, and stream mapping has been
1984	      negotiated and enough streams are available, a Connection Object
1985	      instantly becomes Ready after calling "Initiate" or
1986	      "InitiateWithSend".

1988	   InitiateError:  Failure of CONNECT.SCTP.

1990	   ConnectionError:  TIMEOUT.SCTP or ABORT-EVENT.SCTP.

1992	   Listen:  LISTEN.SCTP.  If an association with the peer already exists
1993	      and stream mapping has been negotiated, "Listen" just expects to
1994	      receive a new message on a new stream id (chosen in accordance
1995	      with the stream number assignment procedure described above).

1997	   ConnectionReceived:  LISTEN.SCTP returns without an error (a result
1998	      of successful CONNECT.SCTP from the peer), or, in case of stream
1999	      mapping, the first message has arrived on a new stream (in this
2000	      case, "Receive" is also invoked).

2002	   Clone:  Calling "Clone" on an SCTP association creates a new
2003	      Connection object and assigns it a new stream number in accordance
2004	      with the stream number assignment procedure described above.  If
2005	      there are not enough streams available, ADD_STREAM.SCTP must be
2006	      called.

2008	   Priority (Connection):  When this value is changed, or a Message with
2009	      Message Property "Priority" is sent, and there are multiple
2010	      Connection objects assigned to the same SCTP association,
2011	      CONFIGURE_STREAM_SCHEDULER.SCTP is called to adjust the priorities
2012	      of streams in the SCTP association.

2014	   Send:  SEND.SCTP.  Message Properties such as "Lifetime" and
2015	      "Ordered" map to parameters of this primitive.

2017	   Receive:  RECEIVE.SCTP.  The "partial flag" of RECEIVE.SCTP invokes a
2018	      "ReceivedPartial" event.

2020	   Close: If this is the only Connection object that is assigned to the
2021	   SCTP association, CLOSE.SCTP is called.  Else, the Connection object
2022	   is one out of several Connection objects that are assigned to the
2023	   same SCTP assocation, and RESET_STREAM.SCTP must be called, which
2024	   informs the peer that the stream will no longer be used for mapping
2025	   and can be used by future "Initiate", "InitiateWithSend" or "Listen"
2026	   calls.  At the peer, the event RESET_STREAM-EVENT.SCTP will fire,
2027	   which the peer must answer by issuing RESET_STREAM.SCTP too.  The
2028	   resulting local RESET_STREAM-EVENT.SCTP informs the transport system
2029	   that the stream number can now be re-used by the next "Initiate",
2030	   "InitiateWithSend" or "Listen" calls.

2032	   Abort: If this is the only Connection object that is assigned to the
2033	   SCTP association, ABORT.SCTP is called.  Else, the Connection object
2034	   is one out of several Connection objects that are assigned to the
2035	   same SCTP assocation, and shutdown proceeds as described under
2036	   "Close".

2038	11.  IANA Considerations

2040	   RFC-EDITOR: Please remove this section before publication.

2042	   This document has no actions for IANA.

2044	12.  Security Considerations

2046	   [I-D.ietf-taps-arch] outlines general security consideration and
2047	   requirements for any system that implements the TAPS archtecture.
2048	   [I-D.ietf-taps-interface] provides further discussion on security and
2049	   privacy implications of the TAPS API.  This document provides
2050	   additional guidance on implementation specifics for the TAPS API and
2051	   as such the security considerations in both of these documents apply.
2052	   The next two subsections discuss further considerations that are
2053	   specific to mechanisms specified in this document.

2055	12.1.  Considerations for Candidate Gathering

2057	   Implementations should avoid downgrade attacks that allow network
2058	   interference to cause the implementation to select less secure, or
2059	   entirely insecure, combinations of paths and protocols.

2061	12.2.  Considerations for Candidate Racing

2063	   See Section 5.3 for security considerations around racing with 0-RTT
2064	   data.

2066	   An attacker that knows a particular device is racing several options
2067	   during connection establishment may be able to block packets for the
2068	   first connection attempt, thus inducing the device to fall back to a
2069	   secondary attempt.  This is a problem if the secondary attempts have
2070	   worse security properties that enable further attacks.
2071	   Implementations should ensure that all options have equivalent
2072	   security properties to avoid incentivizing attacks.

2074	   Since results from the network can determine how a connection attempt
2075	   tree is built, such as when DNS returns a list of resolved endpoints,
2076	   it is possible for the network to cause an implementation to consume
2077	   significant on-device resources.  Implementations should limit the
2078	   maximum amount of state allowed for any given node, including the
2079	   number of child nodes, especially when the state is based on results
2080	   from the network.

2082	13.  Acknowledgements

2084	   This work has received funding from the European Union's Horizon 2020
2085	   research and innovation programme under grant agreement No. 644334
2086	   (NEAT).

2088	   This work has been supported by Leibniz Prize project funds of DFG -
2089	   German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ
2090	   FE 570/4-1).

2092	   This work has been supported by the UK Engineering and Physical
2093	   Sciences Research Council under grant EP/R04144X/1.

2095	   This work has been supported by the Research Council of Norway under
2096	   its "Toppforsk" programme through the "OCARINA" project.

2098	   Thanks to Stuart Cheshire, Josh Graessley, David Schinazi, and Eric
2099	   Kinnear for their implementation and design efforts, including Happy
2100	   Eyeballs, that heavily influenced this work.

2102	14.  References

2104	14.1.  Normative References

2106	   [I-D.ietf-taps-arch]
2107	              Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G.,
2108	              Perkins, C., Tiesel, P., and C. Wood, "An Architecture for
2109	              Transport Services", Work in Progress, Internet-Draft,
2110	              draft-ietf-taps-arch-07, 9 March 2020,
2111	              <http://www.ietf.org/internet-drafts/draft-ietf-taps-arch-
2112	              07.txt>.

2114	   [I-D.ietf-taps-interface]
2115	              Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G.,
2116	              Kuehlewind, M., Perkins, C., Tiesel, P., Wood, C., and T.
2117	              Pauly, "An Abstract Application Layer Interface to
2118	              Transport Services", Work in Progress, Internet-Draft,
2119	              draft-ietf-taps-interface-06, 9 March 2020,
2120	              <http://www.ietf.org/internet-drafts/draft-ietf-taps-
2121	              interface-06.txt>.

2123	   [I-D.ietf-taps-minset]
2124	              Welzl, M. and S. Gjessing, "A Minimal Set of Transport
2125	              Services for End Systems", Work in Progress, Internet-
2126	              Draft, draft-ietf-taps-minset-11, 27 September 2018,
2127	              <http://www.ietf.org/internet-drafts/draft-ietf-taps-
2128	              minset-11.txt>.

2130	   [RFC7413]  Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
2131	              Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
2132	              <https://www.rfc-editor.org/info/rfc7413>.

2134	   [RFC7540]  Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
2135	              Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
2136	              DOI 10.17487/RFC7540, May 2015,
2137	              <https://www.rfc-editor.org/info/rfc7540>.

2139	   [RFC8260]  Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann,
2140	              "Stream Schedulers and User Message Interleaving for the
2141	              Stream Control Transmission Protocol", RFC 8260,
2142	              DOI 10.17487/RFC8260, November 2017,
2143	              <https://www.rfc-editor.org/info/rfc8260>.

2145	   [RFC8303]  Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of
2146	              Transport Features Provided by IETF Transport Protocols",
2147	              RFC 8303, DOI 10.17487/RFC8303, February 2018,
2148	              <https://www.rfc-editor.org/info/rfc8303>.

2150	   [RFC8304]  Fairhurst, G. and T. Jones, "Transport Features of the
2151	              User Datagram Protocol (UDP) and Lightweight UDP (UDP-
2152	              Lite)", RFC 8304, DOI 10.17487/RFC8304, February 2018,
2153	              <https://www.rfc-editor.org/info/rfc8304>.

2155	   [RFC8305]  Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2:
2156	              Better Connectivity Using Concurrency", RFC 8305,
2157	              DOI 10.17487/RFC8305, December 2017,
2158	              <https://www.rfc-editor.org/info/rfc8305>.

2160	   [RFC8446]  Rescorla, E., "The Transport Layer Security (TLS) Protocol
2161	              Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
2162	              <https://www.rfc-editor.org/info/rfc8446>.

2164	14.2.  Informative References

2166	   [I-D.ietf-quic-transport]
2167	              Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
2168	              and Secure Transport", Work in Progress, Internet-Draft,
2169	              draft-ietf-quic-transport-29, 9 June 2020,
2170	              <http://www.ietf.org/internet-drafts/draft-ietf-quic-
2171	              transport-29.txt>.

2173	   [I-D.ietf-tcpm-2140bis]
2174	              Touch, J., Welzl, M., and S. Islam, "TCP Control Block
2175	              Interdependence", Work in Progress, Internet-Draft, draft-
2176	              ietf-tcpm-2140bis-05, 29 April 2020, <http://www.ietf.org/
2177	              internet-drafts/draft-ietf-tcpm-2140bis-05.txt>.

2179	   [NEAT-flow-mapping]
2180	              "Transparent Flow Mapping for NEAT", Workshop on Future of
2181	              Internet Transport (FIT 2017) , 2017.

2183	   [RFC5389]  Rosenberg, J., Mahy, R., Matthews, P., and D. Wing,
2184	              "Session Traversal Utilities for NAT (STUN)", RFC 5389,
2185	              DOI 10.17487/RFC5389, October 2008,
2186	              <https://www.rfc-editor.org/info/rfc5389>.

2188	   [RFC5766]  Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using
2189	              Relays around NAT (TURN): Relay Extensions to Session
2190	              Traversal Utilities for NAT (STUN)", RFC 5766,
2191	              DOI 10.17487/RFC5766, April 2010,
2192	              <https://www.rfc-editor.org/info/rfc5766>.

2194	   [RFC6762]  Cheshire, S. and M. Krochmal, "Multicast DNS", RFC 6762,
2195	              DOI 10.17487/RFC6762, February 2013,
2196	              <https://www.rfc-editor.org/info/rfc6762>.

2198	   [RFC6763]  Cheshire, S. and M. Krochmal, "DNS-Based Service
2199	              Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013,
2200	              <https://www.rfc-editor.org/info/rfc6763>.

2202	   [RFC7657]  Black, D., Ed. and P. Jones, "Differentiated Services
2203	              (Diffserv) and Real-Time Communication", RFC 7657,
2204	              DOI 10.17487/RFC7657, November 2015,
2205	              <https://www.rfc-editor.org/info/rfc7657>.

2207	   [RFC8445]  Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive
2208	              Connectivity Establishment (ICE): A Protocol for Network
2209	              Address Translator (NAT) Traversal", RFC 8445,
2210	              DOI 10.17487/RFC8445, July 2018,
2211	              <https://www.rfc-editor.org/info/rfc8445>.

2213	Appendix A.  Additional Properties

2215	   This appendix discusses implementation considerations for additional
2216	   parameters and properties that could be used to enhance transport
2217	   protocol and/or path selection, or the transmission of messages given
2218	   a Protocol Stack that implements them.  These are not part of the
2219	   interface, and may be removed from the final document, but are
2220	   presented here to support discussion within the TAPS working group as
2221	   to whether they should be added to a future revision of the base
2222	   specification.

2224	A.1.  Properties Affecting Sorting of Branches

2226	   In addition to the Protocol and Path Selection Properties discussed
2227	   in Section 4.1.5, the following properties under discussion can
2228	   influence branch sorting:

2230	   *  Bounds on Send or Receive Rate: If the application indicates a
2231	      bound on the expected Send or Receive bitrate, an implementation
2232	      may prefer a path that can likely provide the desired bandwidth,
2233	      based on cached maximum throughput, see Section 9.2.  The
2234	      application may know the Send or Receive Bitrate from metadata in
2235	      adaptive HTTP streaming, such as MPEG-DASH.

2237	   *  Cost Preferences: If the application indicates a preference to
2238	      avoid expensive paths, and some paths are associated with a
2239	      monetary cost, an implementation should decrease the ranking of
2240	      such paths.  If the application indicates that it prohibits using
2241	      expensive paths, paths that are associated with a cost should be
2242	      purged from the decision tree.

2244	Appendix B.  Reasons for errors

2246	   The Transport Services API [I-D.ietf-taps-interface] allows for the
2247	   several generic error types to specify a more detailed reason as to
2248	   why an error occurred.  This appendix lists some of the possible
2249	   reasons.

2251	   *  InvalidConfiguration: The transport properties and endpoints
2252	      provided by the application are either contradictory or
2253	      incomplete.  Examples include the lack of a remote endpoint on an
2254	      active open or using a multicast group address while not
2255	      requesting a unidirectional receive.

2257	   *  NoCandidates: The configuration is valid, but none of the
2258	      available transport protocols can satisfy the transport properties
2259	      provided by the application.

2261	   *  ResolutionFailed: The remote or local specifier provided by the
2262	      application can not be resolved.

2264	   *  EstablishmentFailed: The TAPS system was unable to establish a
2265	      transport-layer connection to the remote endpoint specified by the
2266	      application.

2268	   *  PolicyProhibited: The system policy prevents the transport system
2269	      from performing the action requested by the application.

2271	   *  NotCloneable: The protocol stack is not capable of being cloned.

2273	   *  MessageTooLarge: The message size is too big for the transport
2274	      system to handle.

2276	   *  ProtocolFailed: The underlying protocol stack failed.

2278	   *  InvalidMessageProperties: The message properties are either
2279	      contradictory to the transport properties or they can not be
2280	      satisfied by the transport system.

2282	   *  DeframingFailed: The data that was received by the underlying
2283	      protocol stack could not be deframed.

2285	   *  ConnectionAborted: The connection was aborted by the peer.

2287	   *  Timeout: Delivery of a message was not possible after a timeout.

2289	Appendix C.  Existing Implementations

2291	   This appendix gives an overview of existing implementations, at the
2292	   time of writing, of transport systems that are (to some degree) in
2293	   line with this document.

2295	   *  Apple's Network.framework:

2297	      -  Network.framework is a transport-level API built for C,
2298	         Objective-C, and Swift.  It a connect-by-name API that supports
2299	         transport security protocols.  It provides userspace
2300	         implementations of TCP, UDP, TLS, DTLS, proxy protocols, and
2301	         allows extension via custom framers.

2303	      -  Documentation: https://developer.apple.com/documentation/
2304	         network (https://developer.apple.com/documentation/network)

2306	   *  NEAT and NEATPy:

2308	      -  NEAT is the output of the European H2020 research project
2309	         "NEAT"; it is a user-space library for protocol-independent
2310	         communication on top of TCP, UDP and SCTP, with many more
2311	         features such as a policy manager.

2313	      -  Code: https://github.com/NEAT-project/neat (https://github.com/
2314	         NEAT-project/neat)

2316	      -  NEAT project: https://www.neat-project.org (https://www.neat-
2317	         project.org)

2319	      -  NEATPy is a Python shim over NEAT which updates the NEAT API to
2320	         be in line with version 6 of the TAPS interface draft.

2322	      -  Code: https://github.com/theagilepadawan/NEATPy
2323	         (https://github.com/theagilepadawan/NEATPy)

2325	   *  PyTAPS:

2327	      -  A TAPS implementation based on Python asyncio, offering
2328	         protocol-independent communication to applications on top of
2329	         TCP, UDP and TLS, with support for multicast.

2331	      -  Code: https://github.com/fg-inet/python-asyncio-taps
2332	         (https://github.com/fg-inet/python-asyncio-taps)

2334	Authors' Addresses

2336	   Anna Brunstrom (editor)
2337	   Karlstad University
2338	   Universitetsgatan 2
2339	   651 88 Karlstad
2340	   Sweden

2342	   Email: anna.brunstrom@kau.se

2344	   Tommy Pauly (editor)
2345	   Apple Inc.
2346	   One Apple Park Way
2347	   Cupertino, California 95014,
2348	   United States of America

2350	   Email: tpauly@apple.com

2352	   Theresa Enghardt
2353	   Netflix
2354	   121 Albright Way
2355	   Los Gatos, CA 95032,
2356	   United States of America

2358	   Email: ietf@tenghardt.net

2360	   Karl-Johan Grinnemo
2361	   Karlstad University
2362	   Universitetsgatan 2
2363	   651 88 Karlstad
2364	   Sweden

2366	   Email: karl-johan.grinnemo@kau.se

2368	   Tom Jones
2369	   University of Aberdeen
2370	   Fraser Noble Building
2371	   Aberdeen, AB24 3UE
2372	   United Kingdom

2374	   Email: tom@erg.abdn.ac.uk
2375	   Philipp S. Tiesel
2376	   TU Berlin
2377	   Einsteinufer 25
2378	   10587 Berlin
2379	   Germany

2381	   Email: philipp@tiesel.net

2383	   Colin Perkins
2384	   University of Glasgow
2385	   School of Computing Science
2386	   Glasgow G12 8QQ
2387	   United Kingdom

2389	   Email: csp@csperkins.org

2391	   Michael Welzl
2392	   University of Oslo
2393	   PO Box 1080 Blindern
2394	   0316  Oslo
2395	   Norway

2397	   Email: michawe@ifi.uio.no