idnits 2.17.1 

draft-ietf-taps-impl-12.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 7 instances of too long lines in the document, the longest one
     being 41 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 1320: '... Implementations SHOULD ensure that th...'
     RFC 2119 keyword, line 1934: '... Transport Services API MUST provide a...'
     RFC 2119 keyword, line 1941: '...   [RFC6525] MUST be supported by both the client and the server side....'
     RFC 2119 keyword, line 1943: '...locking, stream mapping SHOULD only be...'
     RFC 2119 keyword, line 1951: '...   been created, MUST always use strea...'
     (3 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (7 March 2022) is 782 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-19) exists of
     draft-ietf-taps-arch-12

  == Outdated reference: A later version (-26) exists of
     draft-ietf-taps-interface-14

  ** Obsolete normative reference: RFC 7540 (Obsoleted by RFC 9113)

  -- Obsolete informational reference (is this intentional?): RFC 5389
     (Obsoleted by RFC 8489)

  -- Obsolete informational reference (is this intentional?): RFC 5766
     (Obsoleted by RFC 8656)

  -- Obsolete informational reference (is this intentional?): RFC 7230
     (Obsoleted by RFC 9110, RFC 9112)


     Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TAPS Working Group                                     A. Brunstrom, Ed.
3	Internet-Draft                                       Karlstad University
4	Intended status: Informational                             T. Pauly, Ed.
5	Expires: 8 September 2022                                     Apple Inc.
6	                                                             T. Enghardt
7	                                                                 Netflix
8	                                                               P. Tiesel
9	                                                                  SAP SE
10	                                                                M. Welzl
11	                                                      University of Oslo
12	                                                            7 March 2022

14	             Implementing Interfaces to Transport Services
15	                        draft-ietf-taps-impl-12

17	Abstract

19	   The Transport Services system enables applications to use transport
20	   protocols flexibly for network communication and defines a protocol-
21	   independent Transport Services Application Programming Interface
22	   (API) that is based on an asynchronous, event-driven interaction
23	   pattern.  This document serves as a guide to implementation on how to
24	   build such a system.

26	Status of This Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at https://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on 8 September 2022.

43	Copyright Notice

45	   Copyright (c) 2022 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
50	   license-info) in effect on the date of publication of this document.
51	   Please review these documents carefully, as they describe your rights
52	   and restrictions with respect to this document.  Code Components
53	   extracted from this document must include Revised BSD License text as
54	   described in Section 4.e of the Trust Legal Provisions and are
55	   provided without warranty as described in the Revised BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
60	   2.  Implementing Connection Objects . . . . . . . . . . . . . . .   4
61	   3.  Implementing Pre-Establishment  . . . . . . . . . . . . . . .   5
62	     3.1.  Configuration-time errors . . . . . . . . . . . . . . . .   5
63	     3.2.  Role of system policy . . . . . . . . . . . . . . . . . .   6
64	   4.  Implementing Connection Establishment . . . . . . . . . . . .   7
65	     4.1.  Structuring Candidates as a Tree  . . . . . . . . . . . .   8
66	       4.1.1.  Branch Types  . . . . . . . . . . . . . . . . . . . .  10
67	       4.1.2.  Branching Order-of-Operations . . . . . . . . . . . .  12
68	       4.1.3.  Sorting Branches  . . . . . . . . . . . . . . . . . .  14
69	     4.2.  Candidate Gathering . . . . . . . . . . . . . . . . . . .  15
70	       4.2.1.  Gathering Endpoint Candidates . . . . . . . . . . . .  15
71	     4.3.  Candidate Racing  . . . . . . . . . . . . . . . . . . . .  17
72	       4.3.1.  Simultaneous  . . . . . . . . . . . . . . . . . . . .  17
73	       4.3.2.  Staggered . . . . . . . . . . . . . . . . . . . . . .  18
74	       4.3.3.  Failover  . . . . . . . . . . . . . . . . . . . . . .  19
75	     4.4.  Completing Establishment  . . . . . . . . . . . . . . . .  19
76	       4.4.1.  Determining Successful Establishment  . . . . . . . .  20
77	     4.5.  Establishing multiplexed connections  . . . . . . . . . .  21
78	     4.6.  Handling connectionless protocols . . . . . . . . . . . .  21
79	     4.7.  Implementing listeners  . . . . . . . . . . . . . . . . .  21
80	       4.7.1.  Implementing listeners for Connected Protocols  . . .  22
81	       4.7.2.  Implementing listeners for Connectionless
82	               Protocols . . . . . . . . . . . . . . . . . . . . . .  22
83	       4.7.3.  Implementing listeners for Multiplexed Protocols  . .  22
84	   5.  Implementing Sending and Receiving Data . . . . . . . . . . .  23
85	     5.1.  Sending Messages  . . . . . . . . . . . . . . . . . . . .  23
86	       5.1.1.  Message Properties  . . . . . . . . . . . . . . . . .  23
87	       5.1.2.  Send Completion . . . . . . . . . . . . . . . . . . .  25
88	       5.1.3.  Batching Sends  . . . . . . . . . . . . . . . . . . .  25
89	     5.2.  Receiving Messages  . . . . . . . . . . . . . . . . . . .  25
90	     5.3.  Handling of data for fast-open protocols  . . . . . . . .  26
91	   6.  Implementing Message Framers  . . . . . . . . . . . . . . . .  27
92	     6.1.  Defining Message Framers  . . . . . . . . . . . . . . . .  28
93	     6.2.  Sender-side Message Framing . . . . . . . . . . . . . . .  29
94	     6.3.  Receiver-side Message Framing . . . . . . . . . . . . . .  30
95	   7.  Implementing Connection Management  . . . . . . . . . . . . .  31
96	     7.1.  Pooled Connection . . . . . . . . . . . . . . . . . . . .  31
97	     7.2.  Handling Path Changes . . . . . . . . . . . . . . . . . .  32
98	   8.  Implementing Connection Termination . . . . . . . . . . . . .  33
99	   9.  Cached State  . . . . . . . . . . . . . . . . . . . . . . . .  34
100	     9.1.  Protocol state caches . . . . . . . . . . . . . . . . . .  34
101	     9.2.  Performance caches  . . . . . . . . . . . . . . . . . . .  35
102	   10. Specific Transport Protocol Considerations  . . . . . . . . .  36
103	     10.1.  TCP  . . . . . . . . . . . . . . . . . . . . . . . . . .  37
104	     10.2.  MPTCP  . . . . . . . . . . . . . . . . . . . . . . . . .  39
105	     10.3.  UDP  . . . . . . . . . . . . . . . . . . . . . . . . . .  39
106	     10.4.  UDP-Lite . . . . . . . . . . . . . . . . . . . . . . . .  40
107	     10.5.  UDP Multicast Receive  . . . . . . . . . . . . . . . . .  40
108	     10.6.  SCTP . . . . . . . . . . . . . . . . . . . . . . . . . .  42
109	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  45
110	   12. Security Considerations . . . . . . . . . . . . . . . . . . .  45
111	     12.1.  Considerations for Candidate Gathering . . . . . . . . .  45
112	     12.2.  Considerations for Candidate Racing  . . . . . . . . . .  45
113	   13. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  46
114	   14. References  . . . . . . . . . . . . . . . . . . . . . . . . .  46
115	     14.1.  Normative References . . . . . . . . . . . . . . . . . .  46
116	     14.2.  Informative References . . . . . . . . . . . . . . . . .  47
117	   Appendix A.  API Mapping Template . . . . . . . . . . . . . . . .  49
118	   Appendix B.  Additional Properties  . . . . . . . . . . . . . . .  50
119	     B.1.  Properties Affecting Sorting of Branches  . . . . . . . .  50
120	   Appendix C.  Reasons for errors . . . . . . . . . . . . . . . . .  51
121	   Appendix D.  Existing Implementations . . . . . . . . . . . . . .  52
122	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  52

124	1.  Introduction

126	   The Transport Services architecture [I-D.ietf-taps-arch] defines a
127	   system that allows applications to flexibly use transport networking
128	   protocols.  The API that such a system exposes to applications is
129	   defined as the Transport Services API [I-D.ietf-taps-interface].
130	   This API is designed to be generic across multiple transport
131	   protocols and sets of protocols features.

133	   This document serves as a guide to implementation on how to build a
134	   system that provides a Transport Services API.  It is the job of an
135	   implementation of a Transport Services system to turn the requests of
136	   an application into decisions on how to establish connections, and
137	   how to transfer data over those connections once established.  The
138	   terminology used in this document is based on the Architecture
139	   [I-D.ietf-taps-arch].

141	2.  Implementing Connection Objects

143	   The connection objects that are exposed to applications for Transport
144	   Services are:

146	   *  the Preconnection, the bundle of Properties that describes the
147	      application constraints on, and preferences for, the transport;

149	   *  the Connection, the basic object that represents a flow of data as
150	      Messages in either direction between the Local and Remote
151	      Endpoints;

153	   *  and the Listener, a passive waiting object that delivers new
154	      Connections.

156	   Preconnection objects should be implemented as bundles of properties
157	   that an application can both read and write.  A Preconnection object
158	   influences a Connection only at one point in time: when the
159	   Connection is created.  Connection objects represent the interface
160	   between the application and the implementation to manage transport
161	   state, and conduct data transfer.  During the process of
162	   establishment (Section 4), the Connection will not be bound to a
163	   specific transport protocol instance, since multiple candidate
164	   Protocol Stacks might be raced.

166	   Once a Preconnection has been used to create an outbound Connection
167	   or a Listener, the implementation should ensure that the copy of the
168	   properties held by the Connection or Listener cannot be mutated by
169	   the application making changes to the original Preconnection object.
170	   This may involve the implementation performing a deep-copy, copying
171	   the object with all the objects that it references.

173	   Once the Connection is established, Transport Services implementation
174	   maps actions and events to the details of the chosen Protocol Stack.
175	   For example, the same Connection object may ultimately represent a
176	   single instance of one transport protocol (e.g., a TCP connection, a
177	   TLS session over TCP, a UDP flow with fully-specified Local and
178	   Remote Endpoints, a DTLS session, a SCTP stream, a QUIC stream, or an
179	   HTTP/2 stream).  The properties held by a Connection or Listener is
180	   independent of other connections that are not part of the same
181	   Connection Group.

183	   Connection establishment is only a local operation for a Datagram
184	   transport (e.g., UDP(-Lite)), which serves to simplify the local
185	   send/receive functions and to filter the traffic for the specified
186	   addresses and ports [RFC8085].

188	   Once Initiate has been called, the Selection Properties and Endpoint
189	   information are immutable (i.e, an application is not able to later
190	   modify Selection Properties on the original Preconnection object).
191	   Listener objects are created with a Preconnection, at which point
192	   their configuration should be considered immutable by the
193	   implementation.  The process of listening is described in
194	   Section 4.7.

196	3.  Implementing Pre-Establishment

198	   During pre-establishment the application specifies one or more
199	   Endpoints to be used for communication as well as protocol
200	   preferences and constraints via Selection Properties and, if desired,
201	   also Connection Properties.  Generally, Connection Properties should
202	   be configured as early as possible, because they can serve as input
203	   to decisions that are made by the implementation (e.g., the Capacity
204	   Profile can guide usage of a protocol offering scavenger-type
205	   congestion control).

207	   The implementation stores these properties as a part of the
208	   Preconnection object for use during connection establishment.  For
209	   Selection Properties that are not provided by the application, the
210	   implementation must use the default values specified in the Transport
211	   Services API ([I-D.ietf-taps-interface]).

213	3.1.  Configuration-time errors

215	   The Transport Services system should have a list of supported
216	   protocols available, which each have transport features reflecting
217	   the capabilities of the protocol.  Once an application specifies its
218	   Transport Properties, the transport system matches the required and
219	   prohibited properties against the transport features of the available
220	   protocols.

222	   In the following cases, failure should be detected during pre-
223	   establishment:

225	   *  A request by an application for Protocol Properties that cannot be
226	      satisfied by any of the available protocols.  For example, if an
227	      application requires "Configure Reliability per Message", but no
228	      such feature is available in any protocol the host running the
229	      transport system on the host running the transport system this
230	      should result in an error, e.g., when SCTP is not supported by the
231	      operating system.

233	   *  A request by an application for Protocol Properties that are in
234	      conflict with each other, i.e., the required and prohibited
235	      properties cannot be satisfied by the same protocol.  For example,
236	      if an application prohibits "Reliable Data Transfer" but then
237	      requires "Configure Reliability per Message", this mismatch should
238	      result in an error.

240	   To avoid allocating resources that are not finally needed, it is
241	   important that configuration-time errors fail as early as possible.

243	3.2.  Role of system policy

245	   The properties specified during pre-establishment have a close
246	   relationship to system policy.  The implementation is responsible for
247	   combining and reconciling several different sources of preferences
248	   when establishing Connections.  These include, but are not limited
249	   to:

251	   1.  Application preferences, i.e., preferences specified during the
252	       pre-establishment via Selection Properties.

254	   2.  Dynamic system policy, i.e., policy compiled from internally and
255	       externally acquired information about available network
256	       interfaces, supported transport protocols, and current/previous
257	       Connections.  Examples of ways to externally retrieve policy-
258	       support information are through OS-specific statistics/
259	       measurement tools and tools that reside on middleboxes and
260	       routers.

262	   3.  Default implementation policy, i.e., predefined policy by OS or
263	       application.

265	   In general, any protocol or path used for a connection must conform
266	   to all three sources of constraints.  A violation that occurs at any
267	   of the policy layers should cause a protocol or path to be considered
268	   ineligible for use.  For an example of application preferences
269	   leading to constraints, an application may prohibit the use of
270	   metered network interfaces for a given Connection to avoid user cost.
271	   Similarly, the system policy at a given time may prohibit the use of
272	   such a metered network interface from the application's process.
273	   Lastly, the implementation itself may default to disallowing certain
274	   network interfaces unless explicitly requested by the application and
275	   allowed by the system.

277	   It is expected that the database of system policies and the method of
278	   looking up these policies will vary across various platforms.  An
279	   implementation should attempt to look up the relevant policies for
280	   the system in a dynamic way to make sure it is reflecting an accurate
281	   version of the system policy, since the system's policy regarding the
282	   application's traffic may change over time due to user or
283	   administrative changes.

285	4.  Implementing Connection Establishment

287	   The process of establishing a network connection begins when an
288	   application expresses intent to communicate with a Remote Endpoint by
289	   calling Initiate.  (At this point, any constraints or requirements
290	   the application may have on the connection are available from pre-
291	   establishment.)  The process can be considered complete once there is
292	   at least one Protocol Stack that has completed any required setup to
293	   the point that it can transmit and receive the application's data.

295	   Connection establishment is divided into two top-level steps:
296	   Candidate Gathering, to identify the paths, protocols, and endpoints
297	   to use, and Candidate Racing (see Section 4.2.2 of
298	   [I-D.ietf-taps-arch]), in which the necessary protocol handshakes are
299	   conducted so that the transport system can select which set to use.

301	   This document structures the candidates for racing as a tree as
302	   terminological convention.  While a a tree structure is not the only
303	   way in which racing can be implemented, it does ease the illustration
304	   of how racing works.

306	   The most simple example of this process might involve identifying the
307	   single IP address to which the implementation wishes to connect,
308	   using the system's current default path (i.e., using the default
309	   interface), and starting a TCP handshake to establish a stream to the
310	   specified IP address.  However, each step may also differ depending
311	   on the requirements of the connection: if the endpoint is defined as
312	   a hostname and port, then there may be multiple resolved addresses
313	   that are available; there may also be multiple paths available, (in
314	   this case using an interface other than the default system
315	   interface); and some protocols may not need any transport handshake
316	   to be considered "established" (such as UDP), while other connections
317	   may utilize layered protocol handshakes, such as TLS over TCP.

319	   Whenever an implementation has multiple options for connection
320	   establishment, it can view the set of all individual connection
321	   establishment options as a single, aggregate connection
322	   establishment.  The aggregate set conceptually includes every valid
323	   combination of endpoints, paths, and protocols.  As an example,
324	   consider an implementation that initiates a TCP connection to a
325	   hostname + port endpoint, and has two valid interfaces available (Wi-
326	   Fi and LTE).  The hostname resolves to a single IPv4 address on the
327	   Wi-Fi network, and resolves to the same IPv4 address on the LTE
328	   network, as well as a single IPv6 address.  The aggregate set of
329	   connection establishment options can be viewed as follows:

331	Aggregate [Endpoint: www.example.com:80] [Interface: Any]   [Protocol: TCP]
332	|-> [Endpoint: 192.0.2.1:80]       [Interface: Wi-Fi] [Protocol: TCP]
333	|-> [Endpoint: 192.0.2.1:80]       [Interface: LTE]   [Protocol: TCP]
334	|-> [Endpoint: 2001:DB8::1.80]     [Interface: LTE]   [Protocol: TCP]

336	   Any one of these sub-entries on the aggregate connection attempt
337	   would satisfy the original application intent.  The concern of this
338	   section is the algorithm defining which of these options to try,
339	   when, and in what order.

341	   During Candidate Gathering, an implementation first excludes all
342	   protocols and paths that match a Prohibit or do not match all Require
343	   properties.  Then, the implementation will sort branches according to
344	   Preferred properties, Avoided properties, and possibly other
345	   criteria.

347	4.1.  Structuring Candidates as a Tree

349	   As noted above, the considereration of multiple candidates in a
350	   gathering and racing process can be conceptually structured as a
351	   tree; this terminological convention is used throughout this
352	   document.

354	   Each leaf node of the tree represents a single, coherent connection
355	   attempt, with an endpoint, a network path, and a set of protocols
356	   that can directly negotiate and send data on the network.  Each node
357	   in the tree that is not a leaf represents a connection attempt that
358	   is either underspecified, or else includes multiple distinct options.
359	   For example, when connecting on an IP network, a connection attempt
360	   to a hostname and port is underspecified, because the connection
361	   attempt requires a resolved IP address as its Remote Endpoint.  In
362	   this case, the node represented by the connection attempt to the
363	   hostname is a parent node, with child nodes for each IP address.
364	   Similarly, an implementation that is allowed to connect using
365	   multiple interfaces will have a parent node of the tree for the
366	   decision between the network paths, with a branch for each interface.

368	   The example aggregate connection attempt above can be drawn as a tree
369	   by grouping the addresses resolved on the same interface into
370	   branches:

372	                             ||
373	                +==========================+
374	                |  www.example.com:80/Any  |
375	                +==========================+
376	                  //                    \\
377	+==========================+       +==========================+
378	| www.example.com:80/Wi-Fi |       |  www.example.com:80/LTE  |
379	+==========================+       +==========================+
380	             ||                      //                    \\
381	  +====================+  +====================+  +======================+
382	  | 192.0.2.1:80/Wi-Fi |  |  192.0.2.1:80/LTE  |  |  2001:DB8::1.80/LTE  |
383	  +====================+  +====================+  +======================+

385	   The rest of this section will use a notation scheme to represent this
386	   tree.  The parent (or trunk) node of the tree will be represented by
387	   a single integer, such as "1".  Each child of that node will have an
388	   integer that identifies it, from 1 to the number of children.  That
389	   child node will be uniquely identified by concatenating its integer
390	   to it's parents identifier with a dot in between, such as "1.1" and
391	   "1.2".  Each node will be summarized by a tuple of three elements:
392	   endpoint, path (labeled here by interface), and protocol.  The above
393	   example can now be written more succinctly as:

395	   1 [www.example.com:80, Any, TCP]
396	     1.1 [www.example.com:80, Wi-Fi, TCP]
397	       1.1.1 [192.0.2.1:80, Wi-Fi, TCP]
398	     1.2 [www.example.com:80, LTE, TCP]
399	       1.2.1 [192.0.2.1:80, LTE, TCP]
400	       1.2.2 [2001:DB8::1.80, LTE, TCP]

402	   When an implementation views this aggregate set of connection
403	   attempts as a single connection establishment, it only will use one
404	   of the leaf nodes to transfer data.  Thus, when a single leaf node
405	   becomes ready to use, then the entire connection attempt is ready to
406	   use by the application.  Another way to represent this is that every
407	   leaf node updates the state of its parent node when it becomes ready,
408	   until the trunk node of the tree is ready, which then notifies the
409	   application that the connection as a whole is ready to use.

411	   A connection establishment tree may be degenerate, and only have a
412	   single leaf node, such as a connection attempt to an IP address over
413	   a single interface with a single protocol.

415	   1 [192.0.2.1:80, Wi-Fi, TCP]

417	   A parent node may also only have one child (or leaf) node, such as a
418	   when a hostname resolves to only a single IP address.

420	   1 [www.example.com:80, Wi-Fi, TCP]
421	     1.1 [192.0.2.1:80, Wi-Fi, TCP]

423	4.1.1.  Branch Types

425	   There are three types of branching from a parent node into one or
426	   more child nodes.  Any parent node of the tree must only use one type
427	   of branching.

429	4.1.1.1.  Derived Endpoints

431	   If a connection originally targets a single endpoint, there may be
432	   multiple endpoints of different types that can be derived from the
433	   original.  This creates an ordered list of the derived endpoints
434	   according to application preference, system policy and expected
435	   performance.

437	   DNS hostname-to-address resolution is the most common method of
438	   endpoint derivation.  When trying to connect to a hostname endpoint
439	   on a traditional IP network, the implementation should send DNS
440	   queries for both A (IPv4) and AAAA (IPv6) records if both are
441	   supported on the local interface.  The algorithm for ordering and
442	   racing these addresses should follow the recommendations in Happy
443	   Eyeballs [RFC8305].

445	   1 [www.example.com:80, Wi-Fi, TCP]
446	     1.1 [2001:DB8::1.80, Wi-Fi, TCP]
447	     1.2 [192.0.2.1:80, Wi-Fi, TCP]
448	     1.3 [2001:DB8::2.80, Wi-Fi, TCP]
449	     1.4 [2001:DB8::3.80, Wi-Fi, TCP]

451	   DNS-Based Service Discovery [RFC6763] can also provide an endpoint
452	   derivation step.  When trying to connect to a named service, the
453	   client may discover one or more hostname and port pairs on the local
454	   network using multicast DNS [RFC6762].  These hostnames should each
455	   be treated as a branch that can be attempted independently from other
456	   hostnames.  Each of these hostnames might resolve to one or more
457	   addresses, which would create multiple layers of branching.

459	   1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP]
460	     1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP]
461	       1.1.1 [31.133.160.18.631, Wi-Fi, TCP]

463	   Applications can influence which derived endpoints are allowed and
464	   preferred via Selection Properties set on the Preconnection.  For
465	   example, setting a preference for useTemporaryLocalAddress would
466	   prefer the use of IPv6 over IPv4, and requiring
467	   useTemporaryLocalAddress would eliminate IPv4 options, since IPv4
468	   does not support temporary addresses.

470	4.1.1.2.  Alternate Paths

472	   If a client has multiple network paths available to it, e.g., a
473	   mobile client with intefaces for both Wi-Fi and Cellular
474	   connectivity, it can attempt a connection over any of the paths.
475	   This represents a branch point in the connection establishment.
476	   Similar to a derived endpoint, the paths should be ranked based on
477	   preference, system policy, and performance.  Attempts should be
478	   started on one path (e.g., a specific interface), and then
479	   successively on other paths (or interfaces) after delays based on
480	   expected path round-trip-time or other available metrics.

482	   1 [192.0.2.1:80, Any, TCP]
483	     1.1 [192.0.2.1:80, Wi-Fi, TCP]
484	     1.2 [192.0.2.1:80, LTE, TCP]

486	   This same approach applies to any situation in which the client is
487	   aware of multiple links or views of the network.  A single interface
488	   may be shared by multiple network paths, each with a coherent set of
489	   addresses, routes, DNS server, and more.  A path may also represent a
490	   virtual interface service such as a Virtual Private Network (VPN).

492	   The list of available paths should be constrained by any requirements
493	   the application sets, as well as by the system policy.

495	4.1.1.3.  Protocol Options

497	   Differences in possible protocol compositions and options can also
498	   provide a branching point in connection establishment.  This allows
499	   clients to be resilient to situations in which a certain protocol is
500	   not functioning on a server or network.

502	   This approach is commonly used for connections with optional proxy
503	   server configurations.  A single connection might have several
504	   options available: an HTTP-based proxy, a SOCKS-based proxy, or no
505	   proxy.  These options should be ranked and attempted in succession.

507	   1 [www.example.com:80, Any, HTTP/TCP]
508	     1.1 [192.0.2.8:80, Any, HTTP/HTTP Proxy/TCP]
509	     1.2 [192.0.2.7:10234, Any, HTTP/SOCKS/TCP]
510	     1.3 [www.example.com:80, Any, HTTP/TCP]
511	       1.3.1 [192.0.2.1:80, Any, HTTP/TCP]

513	   This approach also allows a client to attempt different sets of
514	   application and transport protocols that, when available, could
515	   provide preferable features.  For example, the protocol options could
516	   involve QUIC [I-D.ietf-quic-transport] over UDP on one branch, and
517	   HTTP/2 [RFC7540] over TLS over TCP on the other:

519	   1 [www.example.com:443, Any, Any HTTP]
520	     1.1 [www.example.com:443, Any, QUIC/UDP]
521	       1.1.1 [192.0.2.1:443, Any, QUIC/UDP]
522	     1.2 [www.example.com:443, Any, HTTP2/TLS/TCP]
523	       1.2.1 [192.0.2.1:443, Any, HTTP2/TLS/TCP]

525	   Another example is racing SCTP with TCP:

527	   1 [www.example.com:80, Any, Any Stream]
528	     1.1 [www.example.com:80, Any, SCTP]
529	       1.1.1 [192.0.2.1:80, Any, SCTP]
530	     1.2 [www.example.com:80, Any, TCP]
531	       1.2.1 [192.0.2.1:80, Any, TCP]

533	   Implementations that support racing protocols and protocol options
534	   should maintain a history of which protocols and protocol options
535	   successfully established, on a per-network and per-endpoint basis
536	   (see Section 9.2).  This information can influence future racing
537	   decisions to prioritize or prune branches.

539	4.1.2.  Branching Order-of-Operations

541	   Branch types must occur in a specific order relative to one another
542	   to avoid creating leaf nodes with invalid or incompatible settings.
543	   In the example above, it would be invalid to branch for derived
544	   endpoints (the DNS results for www.example.com) before branching
545	   between interface paths, since there are situations when the results
546	   will be different across networks due to private names or different
547	   supported IP versions.  Implementations must be careful to branch in
548	   an order that results in usable leaf nodes whenever there are
549	   multiple branch types that could be used from a single node.

551	   The order of operations for branching should be:

553	   1.  Alternate Paths
554	   2.  Protocol Options

556	   3.  Derived Endpoints

558	   where a lower number indicates higher precedence and therefore higher
559	   placement in the tree.  Branching between paths is the first in the
560	   list because results across multiple interfaces are likely not
561	   related to one another: endpoint resolution may return different
562	   results, especially when using locally resolved host and service
563	   names, and which protocols are supported and preferred may differ
564	   across interfaces.  Thus, if multiple paths are attempted, the
565	   overall connection can be seen as a race between the available paths
566	   or interfaces.

568	   Protocol options are next checked in order.  Whether or not a set of
569	   protocol, or protocol-specific options, can successfully connect is
570	   generally not dependent on which specific IP address is used.
571	   Furthermore, the protocol stacks being attempted may influence or
572	   altogether change the endpoints being used.  Adding a proxy to a
573	   connection's branch will change the endpoint to the proxy's IP
574	   address or hostname.  Choosing an alternate protocol may also modify
575	   the ports that should be selected.

577	   Branching for derived endpoints is the final step, and may have
578	   multiple layers of derivation or resolution, such as DNS service
579	   resolution and DNS hostname resolution.

581	   For example, if the application has indicated both a preference for
582	   WiFi over LTE and for a feature only available in SCTP, branches will
583	   be first sorted accord to path selection, with WiFi at the top.
584	   Then, branches with SCTP will be sorted to the top within their
585	   subtree according to the properties influencing protocol selection.
586	   However, if the implementation has current cache information that
587	   SCTP is not available on the path over WiFi, there is no SCTP node in
588	   the WiFi subtree.  Here, the path over WiFi will be tried first, and,
589	   if connection establishment succeeds, TCP will be used.  So the
590	   Selection Property of preferring WiFi takes precedence over the
591	   Property that led to a preference for SCTP.

593	   1. [www.example.com:80, Any, Any Stream]
594	   1.1 [192.0.2.1:80, Wi-Fi, Any Stream]
595	   1.1.1 [192.0.2.1:80, Wi-Fi, TCP]
596	   1.2 [192.0.3.1:80, LTE, Any Stream]
597	   1.2.1 [192.0.3.1:80, LTE, SCTP]
598	   1.2.2 [192.0.3.1:80, LTE, TCP]

600	4.1.3.  Sorting Branches

602	   Implementations should sort the branches of the tree of connection
603	   options in order of their preference rank, from most preferred to
604	   least preferred.  Leaf nodes on branches with higher rankings
605	   represent connection attempts that will be raced first.
606	   Implementations should order the branches to reflect the preferences
607	   expressed by the application for its new connection, including
608	   Selection Properties, which are specified in
609	   [I-D.ietf-taps-interface].

611	   In addition to the properties provided by the application, an
612	   implementation may include additional criteria such as cached
613	   performance estimates, see Section 9.2, or system policy, see
614	   Section 3.2, in the ranking.  Two examples of how Selection and
615	   Connection Properties may be used to sort branches are provided
616	   below:

618	   *  "Interface Instance or Type": If the application specifies an
619	      interface type to be preferred or avoided, implementations should
620	      accordingly rank the paths.  If the application specifies an
621	      interface type to be required or prohibited, an implementation is
622	      expeceted to not include the non-conforming paths.

624	   *  "Capacity Profile": An implementation can use the Capacity Profile
625	      to prefer paths that match an application's expected traffic
626	      pattern.  This match will use cached performance estimates, see
627	      Section 9.2:

629	      -  Scavenger: Prefer paths with the highest expected available
630	         capacity, but minimising impact on other traffic, based on the
631	         observed maximum throughput;

633	      -  Low Latency/Interactive: Prefer paths with the lowest expected
634	         Round Trip Time, based on observed round trip time estimates;

636	      -  Low Latency/Non-Interactive: Prefer paths with a low expected
637	         Round Trip Time, but can tolerate delay variation;

639	      -  Constant-Rate Streaming: Prefer paths that are expected to
640	         satisy the requested Stream Send or Stream Receive Bitrate,
641	         based on the observed maximum throughput;

643	      -  Capacity-Seeking: Prefer adapting to paths to determine the
644	         highest available capacity, based on the observed maximum
645	         throughput.

647	   Implementations process the Properties in the following order:
648	   Prohibit, Require, Prefer, Avoid.  If Selection Properties contain
649	   any prohibited properties, the implementation should first purge
650	   branches containing nodes with these properties.  For required
651	   properties, it should only keep branches that satisfy these
652	   requirements.  Finally, it should order the branches according to the
653	   preferred properties, and finally use any avoided properties as a
654	   tiebreaker.  When ordering branches, an implementation can give more
655	   weight to properties that the application has explicitly set, than to
656	   the properties that are default.

658	   The available protocols and paths on a specific system and in a
659	   specific context can change; therefore, the result of sorting and the
660	   outcome of racing may vary, even when using the same Selection and
661	   Connection Properties.  However, an implementation ought to provide a
662	   consistent outcome to applications, e.g., by preferring protocols and
663	   paths that are already used by existing Connections that specified
664	   similar Properties.

666	4.2.  Candidate Gathering

668	   The step of gathering candidates involves identifying which paths,
669	   protocols, and endpoints may be used for a given Connection.  This
670	   list is determined by the requirements, prohibitions, and preferences
671	   of the application as specified in the Selection Properties.

673	4.2.1.  Gathering Endpoint Candidates

675	   Both Local and Remote Endpoint Candidates must be discovered during
676	   connection establishment.  To support Interactive Connectivity
677	   Establishment (ICE) [RFC8445], or similar protocols that involve out-
678	   of-band indirect signalling to exchange candidates with the Remote
679	   Endpoint, it is important to query the set of candidate Local
680	   Endpoints, and provide the protocol stack with a set of candidate
681	   Remote Endpoints, before the Local Endpoint attempts to establish
682	   connections.

684	4.2.1.1.  Local Endpoint candidates

686	   The set of possible Local Endpoints is gathered.  In the simple case,
687	   this merely enumerates the local interfaces and protocols, and
688	   allocates ephemeral source ports.  For example, a system that has
689	   WiFi and Ethernet and supports IPv4 and IPv6 might gather four
690	   candidate Local Endpoints (IPv4 on Ethernet, IPv6 on Ethernet, IPv4
691	   on WiFi, and IPv6 on WiFi) that can form the source for a transient.

693	   If NAT traversal is required, the process of gathering Local
694	   Endpoints becomes broadly equivalent to the ICE candidate gathering
695	   phase (see Section 5.1.1. of [RFC8445]).  The endpoint determines its
696	   server reflexive Local Endpoints (i.e., the translated address of a
697	   Local Endpoint, on the other side of a NAT, e.g via a STUN sever
698	   [RFC5389]) and relayed Local Endpoints (e.g., via a TURN server
699	   [RFC5766] or other relay), for each interface and network protocol.
700	   These are added to the set of candidate Local Endpoints for this
701	   connection.

703	   Gathering Local Endpoints is primarily a local operation, although it
704	   might involve exchanges with a STUN server to derive server reflexive
705	   Local Endpoints, or with a TURN server or other relay to derive
706	   relayed Local Endpoints.  However, it does not involve communication
707	   with the Remote Endpoint.

709	4.2.1.2.  Remote Endpoint Candidates

711	   The Remote Endpoint is typically a name that needs to be resolved
712	   into a set of possible addresses that can be used for communication.
713	   Resolving the Remote Endpoint is the process of recursively
714	   performing such name lookups, until fully resolved, to return the set
715	   of candidates for the Remote Endpoint of this connection.

717	   How this resolution is done will depend on the type of the Remote
718	   Endpoint, and can also be specific to each Local Endpoint.  A common
719	   case is when the Remote Endpoint is a DNS name, in which case it is
720	   resolved to give a set of IPv4 and IPv6 addresses representing that
721	   name.  Some types of Remote Endpoint might require more complex
722	   resolution.  Resolving the Remote Endpoint for a peer-to-peer
723	   connection might involve communication with a rendezvous server,
724	   which in turn contacts the peer to gain consent to communicate and
725	   retrieve its set of candidate Local Endpoints, which are returned and
726	   form the candidate remote addresses for contacting that peer.

728	   Resolving the Remote Endpoint is not a local operation.  It will
729	   involve a directory service, and can require communication with the
730	   Remote Endpoint to rendezvous and exchange peer addresses.  This can
731	   expose some or all of the candidate Local Endpoints to the Remote
732	   Endpoint.

734	4.3.  Candidate Racing

736	   The primary goal of the Candidate Racing process is to successfully
737	   negotiate a protocol stack to an endpoint over an interface to
738	   connect a single leaf node of the tree with as little delay and as
739	   few unnecessary connections attempts as possible.  Optimizing these
740	   two factors improves the user experience, while minimizing network
741	   load.

743	   This section covers the dynamic aspect of connection establishment.
744	   The tree described above is a useful conceptual and architectural
745	   model.  However, an implementation is unable to know the full tree
746	   before it is formed and many of the possible branches ultimately
747	   might not be used.

749	   There are three different approaches to racing the attempts for
750	   different nodes of the connection establishment tree:

752	   1.  Simultaneous

754	   2.  Staggered

756	   3.  Failover

758	   Each approach is appropriate in different use-cases and branch types.
759	   However, to avoid consuming unnecessary network resources,
760	   implementations should not use simultaneous racing as a default
761	   approach.

763	   The timing algorithms for racing should remain independent across
764	   branches of the tree.  Any timers or racing logic is isolated to a
765	   given parent node, and is not ordered precisely with regards to other
766	   children of other nodes.

768	4.3.1.  Simultaneous

770	   Simultaneous racing is when multiple alternate branches are started
771	   without waiting for any one branch to make progress before starting
772	   the next alternative.  This means the attempts are effectively
773	   simultaneous.  Simultaneous racing should be avoided by
774	   implementations, since it consumes extra network resources and
775	   establishes state that might not be used.

777	4.3.2.  Staggered

779	   Staggered racing can be used whenever a single node of the tree has
780	   multiple child nodes.  Based on the order determined when building
781	   the tree, the first child node will be initiated immediately,
782	   followed by the next child node after some delay.  Once that second
783	   child node is initiated, the third child node (if present) will begin
784	   after another delay, and so on until all child nodes have been
785	   initiated, or one of the child nodes successfully completes its
786	   negotiation.

788	   Staggered racing attempts can proceed in parallel.  Implementations
789	   should not terminate an earlier child connection attempt upon
790	   starting a secondary child.

792	   If a child node fails to establish connectivity (as in Section 4.4.1)
793	   before the delay time has expired for the next child, the next child
794	   should be started immediately.

796	   Staggered racing between IP addresses for a generic Connection should
797	   follow the Happy Eyeballs algorithm described in [RFC8305].
798	   [RFC8421] provides guidance for racing when performing Interactive
799	   Connectivity Establishment (ICE).

801	   Generally, the delay before starting a given child node ought to be
802	   based on the length of time the previously started child node is
803	   expected to take before it succeeds or makes progress in connection
804	   establishment.  Algorithms like Happy Eyeballs choose a delay based
805	   on how long the transport connection handshake is expected to take.
806	   When performing staggered races in multiple branch types (such as
807	   racing between network interfaces, and then racing between IP
808	   addresses), a longer delay may be chosen for some branch types.  For
809	   example, when racing between network interfaces, the delay should
810	   also take into account the amount of time it takes to prepare the
811	   network interface (such as radio association) and name resolution
812	   over that interface, in addition to the delay that would be added for
813	   a single transport connection handshake.

815	   Since the staggered delay can be chosen based on dynamic information,
816	   such as predicted round-trip time, implementations should define
817	   upper and lower bounds for delay times.  These bounds are
818	   implementation-specific, and may differ based on which branch type is
819	   being used.

821	4.3.3.  Failover

823	   If an implementation or application has a strong preference for one
824	   branch over another, the branching node may choose to wait until one
825	   child has failed before starting the next.  Failure of a leaf node is
826	   determined by its protocol negotiation failing or timing out; failure
827	   of a parent branching node is determined by all of its children
828	   failing.

830	   An example in which failover is recommended is a race between a
831	   protocol stack that uses a proxy and a protocol stack that bypasses
832	   the proxy.  Failover is useful in case the proxy is down or
833	   misconfigured, but any more aggressive type of racing may end up
834	   unnecessarily avoiding a proxy that was preferred by policy.

836	4.4.  Completing Establishment

838	   The process of connection establishment completes when one leaf node
839	   of the tree has successfully completed negotiation with the Remote
840	   Endpoint, or else all nodes of the tree have failed to connect.  The
841	   first leaf node to complete its connection is then used by the
842	   application to send and receive data.

844	   Successes and failures of a given attempt should be reported up to
845	   parent nodes (towards the trunk of the tree).  For example, in the
846	   following case, if 1.1.1 fails to connect, it reports the failure to
847	   1.1.  Since 1.1 has no other child nodes, it also has failed and
848	   reports that failure to 1.  Because 1.2 has not yet failed, 1 is not
849	   considered to have failed.  Since 1.2 has not yet started, it is
850	   started and the process continues.  Similarly, if 1.1.1 successfully
851	   connects, then it marks 1.1 as connected, which propagates to the
852	   trunk node 1.  At this point, the connection as a whole is considered
853	   to be successfully connected and ready to process application data.

855	   1 [www.example.com:80, Any, TCP]
856	     1.1 [www.example.com:80, Wi-Fi, TCP]
857	       1.1.1 [192.0.2.1:80, Wi-Fi, TCP]
858	     1.2 [www.example.com:80, LTE, TCP]
859	   ...

861	   If a leaf node has successfully completed its connection, all other
862	   attempts should be made ineligible for use by the application for the
863	   original request.  New connection attempts that involve transmitting
864	   data on the network ought not to be started after another leaf node
865	   has already successfully completed, because the connection as a whole
866	   has now been established.  An implementation may choose to let
867	   certain handshakes and negotiations complete in order to gather
868	   metrics to influence future connections.  Keeping additional
869	   connections is generally not recommended since those attempts were
870	   slower to connect and may exhibit less desirable properties.

872	4.4.1.  Determining Successful Establishment

874	   Implementations may select the criteria by which a leaf node is
875	   considered to be successfully connected differently on a per-protocol
876	   basis.  If the only protocol being used is a transport protocol with
877	   a clear handshake, like TCP, then the obvious choice is to declare
878	   that node "connected" when the last packet of the three-way handshake
879	   has been received.  If the only protocol being used is an
880	   connectionless protocol, like UDP, the implementation may consider
881	   the node fully "connected" the moment it determines a route is
882	   present, before sending any packets on the network, see further
883	   Section 4.6.

885	   For protocol stacks with multiple handshakes, the decision becomes
886	   more nuanced.  If the protocol stack involves both TLS and TCP, an
887	   implementation could determine that a leaf node is connected after
888	   the TCP handshake is complete, or it can wait for the TLS handshake
889	   to complete as well.  The benefit of declaring completion when the
890	   TCP handshake finishes, and thus stopping the race for other branches
891	   of the tree, is reduced burden on the network and Remote Endpoints
892	   from further connection attempts that are likely to be abandoned.  On
893	   the other hand, by waiting until the TLS handshake is complete, an
894	   implementation avoids the scenario in which a TCP handshake completes
895	   quickly, but TLS negotiation is either very slow or fails altogether
896	   in particular network conditions or to a particular endpoint.  To
897	   avoid the issue of TLS possibly failing, the implementation should
898	   not generate a Ready event for the Connection until TLS is
899	   established.

901	   If all of the leaf nodes fail to connect during racing, i.e. none of
902	   the configurations that satisfy all requirements given in the
903	   Transport Properties actually work over the available paths, then the
904	   transport system should notify the application with an InitiateError
905	   event.  An InitiateError event should also be generated in case the
906	   transport system finds no usable candidates to race.

908	4.5.  Establishing multiplexed connections

910	   Multiplexing several Connections over a single underlying transport
911	   connection requires that the Connections to be multiplexed belong to
912	   the same Connection Group (as is indicated by the application using
913	   the Clone call).  When the underlying transport connection supports
914	   multi-streaming, the Transport Services System can map each
915	   Connection in the Connection Group to a different stream.  Thus, when
916	   the Connections that are offered to an application by the Transport
917	   Services API are multiplexed, the Transport Services implementation
918	   can establish a new Connection by simply beginning to use a new
919	   stream of an already established transport Connection and there is no
920	   need for a connection establishment procedure.  This, then, also
921	   means that there may not be any "establishment" message (like a TCP
922	   SYN), but the application can simply start sending or receiving.
923	   Therefore, when the Initiate action of a Transport Services API is
924	   called without Messages being handed over, it cannot be guaranteed
925	   that the Remote Endpoint will have any way to know about this, and
926	   hence a passive endpoint's ConnectionReceived event might not be
927	   called until data is received.  Instead, calling the
928	   ConnectionReceived event could be delayed until the first Message
929	   arrives.

931	4.6.  Handling connectionless protocols

933	   While protocols that use an explicit handshake to validate a
934	   Connection to a peer can be used for racing multiple establishment
935	   attempts in parallel, connectionless protocols such as raw UDP do not
936	   offer a way to validate the presence of a peer or the usability of a
937	   Connection without application feedback.  An implementation should
938	   consider such a protocol stack to be established as soon as the
939	   Transport Services system has selected a path on which to send data.

941	   However, if a peer is not reachable over the network using the
942	   connectionless protocol, or data cannot be exchanged for any other
943	   reason, the application may want to attempt using another candidate
944	   Protocol Stack.  The implementation should maintain the list of other
945	   candidate Protocol Stacks that were eligible to use.

947	4.7.  Implementing listeners

949	   When an implementation is asked to Listen, it registers with the
950	   system to wait for incoming traffic to the Local Endpoint.  If no
951	   Local Endpoint is specified, the implementation should use an
952	   ephemeral port.

954	   If the Selection Properties do not require a single network interface
955	   or path, but allow the use of multiple paths, the Listener object
956	   should register for incoming traffic on all of the network interfaces
957	   or paths that conform to the Properties.  The set of available paths
958	   can change over time, so the implementation should monitor network
959	   path changes, and change the registration of the Listener across all
960	   usable paths as appropriate.  When using multiple paths, the Listener
961	   is generally expected to use the same port for listening on each.

963	   If the Selection Properties allow multiple protocols to be used for
964	   listening, and the implementation supports it, the Listener object
965	   should support receiving inbound connections for each eligible
966	   protocol on each eligible path.

968	4.7.1.  Implementing listeners for Connected Protocols

970	   Connected protocols such as TCP and TLS-over-TCP have a strong
971	   mapping between the Local and Remote Endpoints (four-tuple) and their
972	   protocol connection state.  These map into Connection objects.
973	   Whenever a new inbound handshake is being started, the Listener
974	   should generate a new Connection object and pass it to the
975	   application.

977	4.7.2.  Implementing listeners for Connectionless Protocols

979	   Connectionless protocols such as UDP and UDP-lite generally do not
980	   provide the same mechanisms that connected protocols do to offer
981	   Connection objects.  Implementations should wait for incoming packets
982	   for connectionless protocols on a listening port and should perform
983	   four-tuple matching of packets to either existing Connection objects
984	   or the creation of new Connection objects.  On platforms with
985	   facilities to create a "virtual connection" for connectionless
986	   protocols implementations should use these mechanisms to minimise the
987	   handling of datagrams intended for already created Connection
988	   objects.

990	4.7.3.  Implementing listeners for Multiplexed Protocols

992	   Protocols that provide multiplexing of streams into a single four-
993	   tuple can listen both for entirely new connections (a new HTTP/2
994	   stream on a new TCP connection, for example) and for new sub-
995	   connections (a new HTTP/2 stream on an existing connection).  If the
996	   abstraction of Connection presented to the application is mapped to
997	   the multiplexed stream, then the Listener should deliver new
998	   Connection objects in the same way for either case.  The
999	   implementation should allow the application to introspect the
1000	   Connection Group marked on the Connections to determine the grouping
1001	   of the multiplexing.

1003	5.  Implementing Sending and Receiving Data

1005	   The most basic mapping for sending a Message is an abstraction of
1006	   datagrams, in which the transport protocol naturally deals in
1007	   discrete packets.  Each Message here corresponds to a single
1008	   datagram.  Generally, these will be short enough that sending and
1009	   receiving will always use a complete Message.

1011	   For protocols that expose byte-streams, the only delineation provided
1012	   by the protocol is the end of the stream in a given direction.  Each
1013	   Message in this case corresponds to the entire stream of bytes in a
1014	   direction.  These Messages may be quite long, in which case they can
1015	   be sent in multiple parts.

1017	   Protocols that provide the framing (such as length-value protocols,
1018	   or protocols that use delimiters) may support Message sizes that do
1019	   not fit within a single datagram.  Each Message for framing protocols
1020	   corresponds to a single frame, which may be sent either as a complete
1021	   Message in the underlying protocol, or in multiple parts.

1023	5.1.  Sending Messages

1025	   The effect of the application sending a Message is determined by the
1026	   top-level protocol in the established Protocol Stack.  That is, if
1027	   the top-level protocol provides an abstraction of framed messages
1028	   over a connection, the receiving application will be able to obtain
1029	   multiple Messages on that connection, even if the framing protocol is
1030	   built on a byte-stream protocol like TCP.

1032	5.1.1.  Message Properties

1034	   *  Lifetime: this should be implemented by removing the Message from
1035	      the queue of pending Messages after the Lifetime has expired.  A
1036	      queue of pending Messages within the transport system
1037	      implementation that have yet to be handed to the Protocol Stack
1038	      can always support this property, but once a Message has been sent
1039	      into the send buffer of a protocol, only certain protocols may
1040	      support removing a message.  For example, an implementation cannot
1041	      remove bytes from a TCP send buffer, while it can remove data from
1042	      a SCTP send buffer using the partial reliability extension
1043	      [RFC8303].  When there is no standing queue of Messages within the
1044	      system, and the Protocol Stack does not support the removal of a
1045	      Message from the stack's send buffer, this property may be
1046	      ignored.

1048	   *  Priority: this represents the ability to prioritize a Message over
1049	      other Messages.  This can be implemented by the system re-ordering
1050	      Messages that have yet to be handed to the Protocol Stack, or by
1051	      giving relative priority hints to protocols that support
1052	      priorities per Message.  For example, an implementation of HTTP/2
1053	      could choose to send Messages of different Priority on streams of
1054	      different priority.

1056	   *  Ordered: when this is false, this disables the requirement of in-
1057	      order-delivery for protocols that support configurable ordering.
1058	      When the protocol stack does not support configurable ordering,
1059	      this property may be ignored.

1061	   *  Safely Replayable: when this is true, this means that the Message
1062	      can be used by a transport mechanism that might transfer it
1063	      multiple times -- e.g., as a result of racing multiple transports
1064	      or as part of TCP Fast Open.  Also, protocols that do not protect
1065	      against duplicated messages, such as UDP (when used directly,
1066	      without a protocol layered atop), can only be used with Messages
1067	      that are Safely Replayable.  When a transport system is permitted
1068	      to replay messages, replay protection could be provided by the
1069	      application.

1071	   *  Final: when this is true, this means that the sender will not send
1072	      any further messages.  The Connection need not be closed (in case
1073	      the Protocol Stack supports half-close operation, like TCP).  Any
1074	      messages sent after a Final message will result in a SendError.

1076	   *  Corruption Protection Length: when this is set to any value other
1077	      than Full Coverage, it sets the minimum protection in protocols
1078	      that allow limiting the checksum length (e.g.  UDP-Lite).  If the
1079	      protocol stack does not support checksum length limitation, this
1080	      property may be ignored.

1082	   *  Reliable Data Transfer (Message): When true, the property
1083	      specifies that the Message must be reliably transmitted.  When
1084	      false, and if unreliable transmission is supported by the
1085	      underlying protocol, then the Message should be unreliably
1086	      transmitted.  If the underlying protocol does not support
1087	      unreliable transmission, the Message should be reliably
1088	      transmitted.

1090	   *  Message Capacity Profile Override: When true, this expresses a
1091	      wish to override the Generic Connection Property Capacity Profile
1092	      for this Message.  Depending on the value, this can, for example,
1093	      be implemented by changing the DSCP value of the associated packet
1094	      (note that the guidelines in Section 6 of [RFC7657] apply; e.g.,
1095	      the DSCP value should not be changed for different packets within
1096	      a reliable transport protocol session or DCCP connection).

1098	   *  No Fragmentation: When set, this property limits the message size
1099	      to the Maximum Message Size Before Fragmentation or Segmentation
1100	      (see Section 10.1.7 of [I-D.ietf-taps-interface]).  Messages
1101	      larger than this size generate an error.  Setting this avoids
1102	      transport-layer segmentation or network-layer fragmentation.  When
1103	      used with transports running over IP version 4 the Don't Fragment
1104	      bit will be set to avoid on-path IP fragmentation ([RFC8304]).

1106	5.1.2.  Send Completion

1108	   The application should be notified whenever a Message or partial
1109	   Message has been consumed by the Protocol Stack, or has failed to
1110	   send.  The time at which a Message is considered to have been
1111	   consumed by the Protocol Stack may vary depending on the protocol.
1112	   For example, for a basic datagram protocol like UDP, this may
1113	   correspond to the time when the packet is sent into the interface
1114	   driver.  For a protocol that buffers data in queues, like TCP, this
1115	   may correspond to when the data has entered the send buffer.  The
1116	   time at which a message failed to send is when Transport Services
1117	   implementation (including the Protocol Stack) has not successfully
1118	   sent the entire Message content or partial Message content on any
1119	   open candidate connection; this can depend on protocol-specific
1120	   timeouts.

1122	5.1.3.  Batching Sends

1124	   Since sending a Message may involve a context switch between the
1125	   application and the Transport Services system, sending patterns that
1126	   involve multiple small Messages can incur high overhead if each needs
1127	   to be enqueued separately.  To avoid this, the application can
1128	   indicate a batch of Send actions through the API.  When this is used,
1129	   the implementation can defer the processing of Messages until the
1130	   batch is complete.

1132	5.2.  Receiving Messages

1134	   Similar to sending, Receiving a Message is determined by the top-
1135	   level protocol in the established Protocol Stack.  The main
1136	   difference with Receiving is that the size and boundaries of the
1137	   Message are not known beforehand.  The application can communicate in
1138	   its Receive action the parameters for the Message, which can help the
1139	   Transport Services implementation know how much data to deliver and
1140	   when.  For example, if the application only wants to receive a
1141	   complete Message, the implementation should wait until an entire
1142	   Message (datagram, stream, or frame) is read before delivering any
1143	   Message content to the application.  This requires the implementation
1144	   to understand where messages end, either via a supplied deframer or
1145	   because the top-level protocol in the established Protocol Stack
1146	   preserves message boundaries.  If the top-level protocol only
1147	   supports a byte-stream and no framers were supported, the application
1148	   can control the flow of received data by specifying the minimum
1149	   number of bytes of Message content it wants to receive at one time.

1151	   If a Connection finishes before a requested Receive action can be
1152	   satisfied, the Transport Services API should deliver any partial
1153	   Message content outstanding, or if none is available, an indication
1154	   that there will be no more received Messages.

1156	5.3.  Handling of data for fast-open protocols

1158	   Several protocols allow sending higher-level protocol or application
1159	   data during their protocol establishment, such as TCP Fast Open
1160	   [RFC7413] and TLS 1.3 [RFC8446].  This approach is referred to as
1161	   sending Zero-RTT (0-RTT) data.  This is a desirable feature, but
1162	   poses challenges to an implementation that uses racing during
1163	   connection establishment.

1165	   The amount of data that can be sent as 0-RTT data varies by protocol
1166	   and can be queried by the application using the Maximum Message Size
1167	   Concurrent with Connection Establishment Connection Property.  An
1168	   implementation can set this property according to the protocols that
1169	   it will race based on the given Selection Properties when the
1170	   application requests to establish a connection.

1172	   If the application has 0-RTT data to send in any protocol handshakes,
1173	   it needs to provide this data before the handshakes have begun.  When
1174	   racing, this means that the data should be provided before the
1175	   process of connection establishment has begun.  If the application
1176	   wants to send 0-RTT data, it must indicate this to the implementation
1177	   by setting the Safely Replayable send parameter to true when sending
1178	   the data.  In general, 0-RTT data may be replayed (for example, if a
1179	   TCP SYN contains data, and the SYN is retransmitted, the data will be
1180	   retransmitted as well but may be considered as a new connection
1181	   instead of a retransmission).  Also, when racing connections,
1182	   different leaf nodes have the opportunity to send the same data
1183	   independently.  If data is truly safely replayable, this should be
1184	   permissible.

1186	   Once the application has provided its 0-RTT data, a Transport
1187	   Services implementation should keep a copy of this data and provide
1188	   it to each new leaf node that is started and for which a 0-RTT
1189	   protocol is being used.

1191	   It is also possible that protocol stacks within a particular leaf
1192	   node use 0-RTT handshakes without any safely replayable application
1193	   data.  For example, TCP Fast Open could use a Client Hello from TLS
1194	   as its 0-RTT data, shortening the cumulative handshake time.

1196	   0-RTT handshakes often rely on previous state, such as TCP Fast Open
1197	   cookies, previously established TLS tickets, or out-of-band
1198	   distributed pre-shared keys (PSKs).  Implementations should be aware
1199	   of security concerns around using these tokens across multiple
1200	   addresses or paths when racing.  In the case of TLS, any given ticket
1201	   or PSK should only be used on one leaf node, since servers will
1202	   likely reject duplicate tickets in order to prevent replays (see
1203	   section-8.1 [RFC8446]).  If implementations have multiple tickets
1204	   available from a previous connection, each leaf node attempt can use
1205	   a different ticket.  In effect, each leaf node will send the same
1206	   early application data, yet encoded (encrypted) differently on the
1207	   wire.

1209	6.  Implementing Message Framers

1211	   Message Framers are functions that define simple transformations
1212	   between application Message data and raw transport protocol data.  A
1213	   Framer can encapsulate or encode outbound Messages, and decapsulate
1214	   or decode inbound data into Messages.

1216	   While many protocols can be represented as Message Framers, for the
1217	   purposes of the Transport Services API, these are ways for
1218	   applications or application frameworks to define their own Message
1219	   parsing to be included within a Connection's Protocol Stack.  As an
1220	   example, TLS is exposed as a protocol natively supported by the
1221	   Transport Services API, even though it could also serve the purpose
1222	   of framing data over TCP.

1224	   Most Message Framers fall into one of two categories:

1226	   *  Header-prefixed record formats, such as a basic Type-Length-Value
1227	      (TLV) structure

1229	   *  Delimiter-separated formats, such as HTTP/1.1.

1231	   Common Message Framers can be provided by a Transport Services
1232	   implementation, but an implementation ought to allow custom Message
1233	   Framers to be defined by the application or some other piece of
1234	   software.  This section describes one possible API for defining
1235	   Message Framers as an example.

1237	6.1.  Defining Message Framers

1239	   A Message Framer is primarily defined by the code that handles events
1240	   for a framer implementation, specifically how it handles inbound and
1241	   outbound data parsing.  The function that implements custom framing
1242	   logic will be referred to as the "framer implementation", which may
1243	   be provided by a Transport Services implementation or the application
1244	   itself.  The Message Framer refers to the object or function within
1245	   the main Connection implementation that delivers events to the custom
1246	   framer implementation whenever data is ready to be parsed or framed.

1248	   The Transport Services implementation needs to ensure that all of the
1249	   events and actions taken on a Message Framer are synchronized to
1250	   ensure consistent behavior.  For example, some of the actions defined
1251	   below (such as PrependFramer and StartPassthrough) modify how data
1252	   flows in a protocol stack, and require synchronization with sending
1253	   and parsing data in the Message Framer.

1255	   When a Connection establishment attempt begins, an event can be
1256	   delivered to notify the framer implementation that a new Connection
1257	   is being created.  Similarly, a stop event can be delivered when a
1258	   Connection is being torn down.  The framer implementation can use the
1259	   Connection object to look up specific properties of the Connection or
1260	   the network being used that may influence how to frame Messages.

1262	   MessageFramer -> Start(Connection)
1263	   MessageFramer -> Stop(Connection)

1265	   When a Message Framer generates a Start event, the framer
1266	   implementation has the opportunity to start writing some data prior
1267	   to the Connection delivering its Ready event.  This allows the
1268	   implementation to communicate control data to the Remote Endpoint
1269	   that can be used to parse Messages.

1271	   MessageFramer.MakeConnectionReady(Connection)

1273	   Similarly, when a Message Framer generates a Stop event, the framer
1274	   implementation has the opportunity to write some final data or clear
1275	   up its local state before the Closed event is delivered to the
1276	   Application.  The framer implementation can indicate that it has
1277	   finished with this.

1279	   MessageFramer.MakeConnectionClosed(Connection)

1281	   At any time if the implementation encounters a fatal error, it can
1282	   also cause the Connection to fail and provide an error.

1284	   MessageFramer.FailConnection(Connection, Error)
1285	   Should the framer implementation deem the candidate selected during
1286	   racing unsuitable, it can signal this to the Transport Services API
1287	   by failing the Connection prior to marking it as ready.  If there are
1288	   no other candidates available, the Connection will fail.  Otherwise,
1289	   the Connection will select a different candidate and the Message
1290	   Framer will generate a new Start event.

1292	   Before an implementation marks a Message Framer as ready, it can also
1293	   dynamically add a protocol or framer above it in the stack.  This
1294	   allows protocols that need to add TLS conditionally, like STARTTLS
1295	   [RFC3207], to modify the Protocol Stack based on a handshake result.

1297	   otherFramer := NewMessageFramer()
1298	   MessageFramer.PrependFramer(Connection, otherFramer)

1300	   A Message Framer might also choose to go into a passthrough mode once
1301	   an initial exchange or handshake has been completed, such as the
1302	   STARTTLS case mentioned above.  This can also be useful for proxy
1303	   protocols like SOCKS [RFC1928] or HTTP CONNECT [RFC7230].  In such
1304	   cases, a Message Framer implementation can intercept sending and
1305	   receiving of messages at first, but then indicate that no more
1306	   processing is needed.

1308	   MessageFramer.StartPassthrough()

1310	6.2.  Sender-side Message Framing

1312	   Message Framers generate an event whenever a Connection sends a new
1313	   Message.

1315	MessageFramer -> NewSentMessage<Connection, MessageData, MessageContext, IsEndOfMessage>

1317	   Upon receiving this event, a framer implementation is responsible for
1318	   performing any necessary transformations and sending the resulting
1319	   data back to the Message Framer, which will in turn send it to the
1320	   next protocol.  Implementations SHOULD ensure that there is a way to
1321	   pass the original data through without copying to improve
1322	   performance.

1324	   MessageFramer.Send(Connection, Data)

1326	   To provide an example, a simple protocol that adds a length as a
1327	   header would receive the NewSentMessage event, create a data
1328	   representation of the length of the Message data, and then send a
1329	   block of data that is the concatenation of the length header and the
1330	   original Message data.

1332	6.3.  Receiver-side Message Framing

1334	   In order to parse a received flow of data into Messages, the Message
1335	   Framer notifies the framer implementation whenever new data is
1336	   available to parse.

1338	   MessageFramer -> HandleReceivedData<Connection>

1340	   Upon receiving this event, the framer implementation can inspect the
1341	   inbound data.  The data is parsed from a particular cursor
1342	   representing the unprocessed data.  The application requests a
1343	   specific amount of data it needs to have available in order to parse.
1344	   If the data is not available, the parse fails.

1346	MessageFramer.Parse(Connection, MinimumIncompleteLength, MaximumLength) -> (Data, MessageContext, IsEndOfMessage)

1348	   The framer implementation can directly advance the receive cursor
1349	   once it has parsed data to effectively discard data (for example,
1350	   discard a header once the content has been parsed).

1352	   To deliver a Message to the application, the framer implementation
1353	   can either directly deliver data that it has allocated, or deliver a
1354	   range of data directly from the underlying transport and
1355	   simultaneously advance the receive cursor.

1357	MessageFramer.AdvanceReceiveCursor(Connection, Length)
1358	MessageFramer.DeliverAndAdvanceReceiveCursor(Connection, MessageContext, Length, IsEndOfMessage)
1359	MessageFramer.Deliver(Connection, MessageContext, Data, IsEndOfMessage)

1361	   Note that MessageFramer.DeliverAndAdvanceReceiveCursor allows the
1362	   framer implementation to earmark bytes as part of a Message even
1363	   before they are received by the transport.  This allows the delivery
1364	   of very large Messages without requiring the implementation to
1365	   directly inspect all of the bytes.

1367	   To provide an example, a simple protocol that parses a length as a
1368	   header value would receive the HandleReceivedData event, and call
1369	   Parse with a minimum and maximum set to the length of the header
1370	   field.  Once the parse succeeded, it would call AdvanceReceiveCursor
1371	   with the length of the header field, and then call
1372	   DeliverAndAdvanceReceiveCursor with the length of the body that was
1373	   parsed from the header, marking the new Message as complete.

1375	7.  Implementing Connection Management

1377	   Once a Connection is established, the Transport Services API allows
1378	   applications to interact with the Connection by modifying or
1379	   inspecting Connection Properties.  A Connection can also generate
1380	   events in the form of Soft Errors.

1382	   The set of Connection Properties that are supported for setting and
1383	   getting on a Connection are described in [I-D.ietf-taps-interface].
1384	   For any properties that are generic, and thus could apply to all
1385	   protocols being used by a Connection, the Transport Services
1386	   implementation should store the properties in storage common to all
1387	   protocols, and notify all protocol instances in the Protocol Stack
1388	   whenever the properties have been modified by the application.  For
1389	   protocol-specfic properties, such as the User Timeout that applies to
1390	   TCP, the Transport Services implementation only needs to update the
1391	   relevant protocol instance.

1393	   If an error is encountered in setting a property (for example, if the
1394	   application tries to set a TCP-specific property on a Connection that
1395	   is not using TCP), the action should fail gracefully.  The
1396	   application may be informed of the error, but the Connection itself
1397	   should not be terminated.

1399	   The Transport Services API should allow protocol instances in the
1400	   Protocol Stack to pass up arbitrary generic or protocol-specific
1401	   errors that can be delivered to the application as Soft Errors.
1402	   These allow the application to be informed of ICMP errors, and other
1403	   similar events.

1405	7.1.  Pooled Connection

1407	   For applications that do not need in-order delivery of Messages, the
1408	   Transport Services implementation may distribute Messages of a single
1409	   Connection across several underlying transport connections or
1410	   multiple streams of multi-streaming connections between endpoints, as
1411	   long as all of these satisfy the Selection Properties.  The Transport
1412	   Services implementation will then hide this connection management and
1413	   only expose a single Connection object, which we here call a "Pooled
1414	   Connection".  This is in contrast to Connection Groups, which
1415	   explicitly expose combined treatment of Connections, giving the
1416	   application control over multiplexing, for example.

1418	   Pooled Connections can be useful when the application using the
1419	   Transport Services system implements a protocol such as HTTP, which
1420	   employs request/response pairs and does not require in-order delivery
1421	   of responses.  This enables implementations of Transport Services
1422	   systems to realize transparent connection coalescing, connection
1423	   migration, and to perform per-message endpoint and path selection by
1424	   choosing among multiple underlying connections.

1426	7.2.  Handling Path Changes

1428	   When a path change occurs, e.g., when the IP address of an interface
1429	   changes or a new interface becomes available, the Transport Services
1430	   implementation is responsible for notifying the Protocol Instance of
1431	   the change.  The path change may interrupt connectivity on a path for
1432	   an active connection or provide an opportunity for a transport that
1433	   supports multipath or migration to adapt to the new paths.  Note
1434	   that, in the model of the Transport Services API, migration is
1435	   considered a part of multipath connectivity; it is just a limiting
1436	   policy on multipath usage.  If the multipath Selection Property is
1437	   set to Disabled, migration is disallowed.

1439	   For protocols that do not support multipath or migration, the
1440	   Protocol Instances should be informed of the path change, but should
1441	   not be forcibly disconnected if the previously used path becomes
1442	   unavailable.  There are many common user scenarios that can lead to a
1443	   path becoming temporarily unavailable, and then recovering before the
1444	   transport protocol reaches a timeout error.  These are particularly
1445	   common using mobile devices.  Examples include: an Ethernet cable
1446	   becoming unplugged and then plugged back in; a device losing a Wi-Fi
1447	   signal while a user is in an elevator, and reattaching when the user
1448	   leaves the elevator; and a user losing the radio signal while riding
1449	   a train through a tunnel.  If the device is able to rejoin a network
1450	   with the same IP address, a stateful transport connection can
1451	   generally resume.  Thus, while it is useful for a Protocol Instance
1452	   to be aware of a temporary loss of connectivity, the Transport
1453	   Services implementation should not aggressively close connections in
1454	   these scenarios.

1456	   If the Protocol Stack includes a transport protocol that supports
1457	   multipath connectivity, the Transport Services implementation should
1458	   also inform the Protocol Instance of potentially new paths that
1459	   become permissible based on the multipath Selection Property and the
1460	   multipath-policy Connection Property choices made by the application.
1461	   A protocol can then establish new subflows over new paths while an
1462	   active path is still available or, if migration is supported, also
1463	   after a break has been detected, and should attempt to tear down
1464	   subflows over paths that are no longer used.  The Connection Property
1465	   multipath-policy of the Transport Services API allows an application
1466	   to indicate when and how different paths should be used.  However,
1467	   detailed handling of these policies is still implementation-specific.
1468	   For example, if the multipath Selection Property is set to active,
1469	   the decision about when to create a new path or to announce a new
1470	   path or set of paths to the Remote Endpoint, e.g., in the form of
1471	   additional IP addresses, is implementation-specific.  If the Protocol
1472	   Stack includes a transport protocol that does not support multipath,
1473	   but does support migrating between paths, the update to the set of
1474	   available paths can trigger the connection to be migrated.

1476	   In case of Pooled Connections Section 7.1, the Transport Services
1477	   implementation may add connections over new paths to the pool if
1478	   permissible based on the multipath policy and Selection Properties.
1479	   In case a previously used path becomes unavailable, the transport
1480	   system may disconnect all connections that require this path, but
1481	   should not disconnect the pooled connection object exposed to the
1482	   application.  The strategy to do so is implementation-specific, but
1483	   should be consistent with the behavior of multipath transports.

1485	8.  Implementing Connection Termination

1487	   With TCP, when an application closes a connection, this means that it
1488	   has no more data to send (but expects all data that has been handed
1489	   over to be reliably delivered).  However, with TCP only, "close" does
1490	   not mean that the application will stop receiving data.  This is
1491	   related to TCP's ability to support half-closed connections.

1493	   SCTP is an example of a protocol that does not support such half-
1494	   closed connections.  Hence, with SCTP, the meaning of "close" is
1495	   stricter: an application has no more data to send (but expects all
1496	   data that has been handed over to be reliably delivered), and will
1497	   also not receive any more data.

1499	   Implementing a protocol independent transport system means that the
1500	   exposed semantics must be the strictest subset of the semantics of
1501	   all supported protocols.  Hence, as is common with all reliable
1502	   transport protocols, after a Close action, the application can expect
1503	   to have its reliability requirements honored regarding the data
1504	   provided to the Transport Services API, but it cannot expect to be
1505	   able to read any more data after calling Close.

1507	   Abort differs from Close only in that no guarantees are given
1508	   regarding any data that the application sent to the Transport
1509	   Services API before calling Abort.

1511	   As explained in Section 4.5, when a new stream is multiplexed on an
1512	   already existing connection of a Transport Protocol Instance, there
1513	   is no need for a connection establishment procedure.  Because the
1514	   Connections that are offered by a Transport Services implementation
1515	   can be implemented as streams that are multiplexed on a transport
1516	   protocol's connection, it can therefore not be guaranteed an Initiate
1517	   action from one endpoint provokes a ConnectionReceived event at its
1518	   peer.

1520	   For Close (provoking a Finished event) and Abort (provoking a
1521	   ConnectionError event), the same logic applies: while it is desirable
1522	   to be informed when a peer closes or aborts a Connection, whether
1523	   this is possible depends on the underlying protocol, and no
1524	   guarantees can be given.  With SCTP, the transport system can use the
1525	   stream reset procedure to cause a Finish event upon a Close action
1526	   from the peer [NEAT-flow-mapping].

1528	9.  Cached State

1530	   Beyond a single Connection's lifetime, it is useful for an
1531	   implementation to keep state and history.  This cached state can help
1532	   improve future Connection establishment due to re-using results and
1533	   credentials, and favoring paths and protocols that performed well in
1534	   the past.

1536	   Cached state may be associated with different endpoints for the same
1537	   Connection, depending on the protocol generating the cached content.
1538	   For example, session tickets for TLS are associated with specific
1539	   endpoints, and thus should be cached based on a Connection's hostname
1540	   endpoint (if applicable).  However, performance characteristics of a
1541	   path are more likely tied to the IP address and subnet being used.

1543	9.1.  Protocol state caches

1545	   Some protocols will have long-term state to be cached in association
1546	   with endpoints.  This state often has some time after which it is
1547	   expired, so the implementation should allow each protocol to specify
1548	   an expiration for cached content.

1550	   Examples of cached protocol state include:

1552	   *  The DNS protocol can cache resolution answers (A and AAAA queries,
1553	      for example), associated with a Time To Live (TTL) to be used for
1554	      future hostname resolutions without requiring asking the DNS
1555	      resolver again.

1557	   *  TLS caches session state and tickets based on a hostname, which
1558	      can be used for resuming sessions with a server.

1560	   *  TCP can cache cookies for use in TCP Fast Open.

1562	   Cached protocol state is primarily used during Connection
1563	   establishment for a single Protocol Stack, but may be used to
1564	   influence an implementation's preference between several candidate
1565	   Protocol Stacks.  For example, if two IP address endpoints are
1566	   otherwise equally preferred, an implementation may choose to attempt
1567	   a connection to an address for which it has a TCP Fast Open cookie.

1569	   Applications can use the Transport Services API to request that a
1570	   Connection Group maintain a separate cache for protocol state.
1571	   Connections in the group will not use cached state from connections
1572	   outside the group, and connections outside the group will not use
1573	   state cached from connections inside the group.  This may be
1574	   necessary, for example, if application-layer identifiers rotate and
1575	   clients wish to avoid linkability via trackable TLS tickets or TFO
1576	   cookies.

1578	9.2.  Performance caches

1580	   In addition to protocol state, Protocol Instances should provide data
1581	   into a performance-oriented cache to help guide future protocol and
1582	   path selection.  Some performance information can be gathered
1583	   generically across several protocols to allow predictive comparisons
1584	   between protocols on given paths:

1586	   *  Observed Round Trip Time

1588	   *  Connection Establishment latency

1590	   *  Connection Establishment success rate

1592	   These items can be cached on a per-address and per-subnet
1593	   granularity, and averaged between different values.  The information
1594	   should be cached on a per-network basis, since it is expected that
1595	   different network attachments will have different performance
1596	   characteristics.  Besides Protocol Instances, other system entities
1597	   may also provide data into performance-oriented caches.  This could
1598	   for instance be signal strength information reported by radio modems
1599	   like Wi-Fi and mobile broadband or information about the battery-
1600	   level of the device.  Furthermore, the system may cache the observed
1601	   maximum throughput on a path as an estimate of the available
1602	   bandwidth.

1604	   An implementation should use this information, when possible, to
1605	   influence preference between candidate paths, endpoints, and protocol
1606	   options.  Eligible options that historically had significantly better
1607	   performance than others should be selected first when gathering
1608	   candidates (see Section 4.2) to ensure better performance for the
1609	   application.

1611	   The reasonable lifetime for cached performance values will vary
1612	   depending on the nature of the value.  Certain information, like the
1613	   connection establishment success rate to a Remote Endpoint using a
1614	   given protocol stack, can be stored for a long period of time (hours
1615	   or longer), since it is expected that the capabilities of the Remote
1616	   Endpoint are not changing very quickly.  On the other hand, the Round
1617	   Trip Time observed by TCP over a particular network path may vary
1618	   over a relatively short time interval.  For such values, the
1619	   implementation should remove them from the cache more quickly, or
1620	   treat older values with less confidence/weight.

1622	   [I-D.ietf-tcpm-2140bis] provides guidance about sharing of TCP
1623	   Control Block information between connections on initialization.

1625	10.  Specific Transport Protocol Considerations

1627	   Each protocol that is supported by a Transport Services
1628	   implementation should have a well-defined API mapping.  API mappings
1629	   for a protocol are important for Connections in which a given
1630	   protocol is the "top" of the Protocol Stack.  For example, the
1631	   mapping of the Send function for TCP applies to Connections in which
1632	   the application directly sends over TCP.

1634	   Each protocol has a notion of Connectedness.  Possible values for
1635	   Connectedness are:

1637	   *  Connectionless.  Connectionless protocols do not establish
1638	      explicit state between endpoints, and do not perform a handshake
1639	      during Connection establishment.

1641	   *  Connected.  Connected protocols establish state between endpoints,
1642	      and perform a handshake during Connection establishment.  The
1643	      handshake may be 0-RTT to send data or resume a session, but
1644	      bidirectional traffic is required to confirm connectedness.

1646	   *  Multiplexing Connected.  Multiplexing Connected protocols share
1647	      properties with Connected protocols, but also explictly support
1648	      opening multiple application-level flows.  This means that they
1649	      can support cloning new Connection objects without a new explicit
1650	      handshake.

1652	   Protocols also define a notion of Data Unit.  Possible values for
1653	   Data Unit are:

1655	   *  Byte-stream.  Byte-stream protocols do not define any Message
1656	      boundaries of their own apart from the end of a stream in each
1657	      direction.

1659	   *  Datagram.  Datagram protocols define Message boundaries at the
1660	      same level of transmission, such that only complete (not partial)
1661	      Messages are supported.

1663	   *  Message.  Message protocols support Message boundaries that can be
1664	      sent and received either as complete or partial Messages.  Maximum
1665	      Message lengths can be defined, and Messages can be partially
1666	      reliable.

1668	   Below, terms in capitals with a dot (e.g., "CONNECT.SCTP") refer to
1669	   the primitives with the same name in section 4 of [RFC8303].  For
1670	   further implementation details, the description of these primitives
1671	   in [RFC8303] points to section 3 of [RFC8303] and section 3 of
1672	   [RFC8304], which refers back to the relevant specifications for each
1673	   protocol.  This back-tracking method applies to all elements of
1674	   [RFC8923] (see appendix D of [I-D.ietf-taps-interface]): they are
1675	   listed in appendix A of [RFC8923] with an implementation hint in the
1676	   same style, pointing back to section 4 of [RFC8303].

1678	   This document defines the API mappings for protocols defined in
1679	   [RFC8923].  Other protocol mappings can be provided as separate
1680	   documents, following the mapping template Appendix A.

1682	10.1.  TCP

1684	   Connectedness: Connected

1686	   Data Unit: Byte-stream

1688	   API mappings for TCP are as follows:

1690	   Connection Object:  TCP connections between two hosts map directly to
1691	      Connection objects.

1693	   Initiate:  CONNECT.TCP.  Calling Initiate on a TCP Connection causes
1694	      it to reserve a local port, and send a SYN to the Remote Endpoint.

1696	   InitiateWithSend:  CONNECT.TCP with parameter user message.  Early
1697	      safely replayable data is sent on a TCP Connection in the SYN, as
1698	      TCP Fast Open data.

1700	   Ready:  A TCP Connection is ready once the three-way handshake is
1701	      complete.

1703	   InitiateError:  Failure of CONNECT.TCP.  TCP can throw various errors
1704	      during connection setup.  Specifically, it is important to handle
1705	      a RST being sent by the peer during the handshake.

1707	   ConnectionError:  Once established, TCP throws errors whenever the
1708	      connection is disconnected, such as due to receiving a RST from
1709	      the peer.

1711	   Listen:  LISTEN.TCP.  Calling Listen for TCP binds a local port and
1712	      prepares it to receive inbound SYN packets from peers.

1714	   ConnectionReceived:  TCP Listeners will deliver new connections once
1715	      they have replied to an inbound SYN with a SYN-ACK.

1717	   Clone:  Calling Clone on a TCP Connection creates a new Connection
1718	      with equivalent parameters.  These Connections, and Connections
1719	      generated via later calls to Clone on an Establied Connection,
1720	      form a Connection Group.  To realize entanglement for these
1721	      Connections, with the exception of Connection Priority, changing a
1722	      Connection Property on one of them must affect the Connection
1723	      Properties of the others too.  No guarantees of honoring the
1724	      Connection Property Connection Priority are given, and thus it is
1725	      safe for an implementation of a transport system to ignore this
1726	      property.  When it is reasonable to assume that Connections
1727	      traverse the same path (e.g., when they share the same
1728	      encapsulation), support for it can also experimentally be
1729	      implemented using a congestion control coupling mechanism (see for
1730	      example [TCP-COUPLING] or [RFC3124]).

1732	   Send:  SEND.TCP.  TCP does not on its own preserve Message
1733	      boundaries.  Calling Send on a TCP connection lays out the bytes
1734	      on the TCP send stream without any other delineation.  Any Message
1735	      marked as Final will cause TCP to send a FIN once the Message has
1736	      been completely written, by calling CLOSE.TCP immediately upon
1737	      successful termination of SEND.TCP.  Note that transmitting a
1738	      Message marked as Final should not cause the Closed event to be
1739	      delivered to the application, as it will still be possible to
1740	      receive data until the peer closes or aborts the TCP connection.

1742	   Receive:  With RECEIVE.TCP, TCP delivers a stream of bytes without
1743	      any Message delineation.  All data delivered in the Received or
1744	      ReceivedPartial event will be part of a single stream-wide Message
1745	      that is marked Final (unless a Message Framer is used).
1746	      EndOfMessage will be delivered when the TCP Connection has
1747	      received a FIN (CLOSE-EVENT.TCP) from the peer.  Note that
1748	      reception of a FIN should not cause the Closed event to be
1749	      delivered to the application, as it will still be possible for the
1750	      application to send data.

1752	   Close:  Calling Close on a TCP Connection indicates that the
1753	      Connection should be gracefully closed (CLOSE.TCP) by sending a
1754	      FIN to the peer.  It will then still be possible to receive data
1755	      until the peer closes or aborts the TCP connection.  The Closed
1756	      event will be issued upon reception of a FIN.

1758	   Abort:  Calling Abort on a TCP Connection indicates that the
1759	      Connection should be immediately closed by sending a RST to the
1760	      peer (ABORT.TCP).

1762	10.2.  MPTCP

1764	   Connectedness: Connected

1766	   Data Unit: Byte-stream

1768	   the Transport Services API mappings for MPTCP are identical to TCP.
1769	   MPTCP adds support for multipath properties, such as "Multipath
1770	   Transport" and "Policy for using Multipath Transports".

1772	10.3.  UDP

1774	   Connectedness: Connectionless

1776	   Data Unit: Datagram

1778	   API mappings for UDP are as follows:

1780	   Connection Object:  UDP connections represent a pair of specific IP
1781	      addresses and ports on two hosts.

1783	   Initiate:  CONNECT.UDP.  Calling Initiate on a UDP Connection causes
1784	      it to reserve a local port, but does not generate any traffic.

1786	   InitiateWithSend:  Early data on a UDP Connection does not have any
1787	      special meaning.  The data is sent whenever the Connection is
1788	      Ready.

1790	   Ready:  A UDP Connection is ready once the system has reserved a
1791	      local port and has a path to send to the Remote Endpoint.

1793	   InitiateError:  UDP Connections can only generate errors on
1794	      initiation due to port conflicts on the local system.

1796	   ConnectionError:  Once in use, UDP throws "soft errors" (ERROR.UDP(-
1797	      Lite)) upon receiving ICMP notifications indicating failures in
1798	      the network.

1800	   Listen:  LISTEN.UDP.  Calling Listen for UDP binds a local port and
1801	      prepares it to receive inbound UDP datagrams from peers.

1803	   ConnectionReceived:  UDP Listeners will deliver new connections once
1804	      they have received traffic from a new Remote Endpoint.

1806	   Clone:  Calling Clone on a UDP Connection creates a new Connection
1807	      with equivalent parameters.  The two Connections are otherwise
1808	      independent.

1810	   Send:  SEND.UDP(-Lite).  Calling Send on a UDP connection sends the
1811	      data as the payload of a complete UDP datagram.  Marking Messages
1812	      as Final does not change anything in the datagram's contents.
1813	      Upon sending a UDP datagram, some relevant fields and flags in the
1814	      IP header can be controlled: DSCP (SET_DSCP.UDP(-Lite)), DF in
1815	      IPv4 (SET_DF.UDP(-Lite)) and ECN flag (SET_ECN.UDP(-Lite)).

1817	   Receive:  RECEIVE.UDP(-Lite).  UDP only delivers complete Messages to
1818	      Received, each of which represents a single datagram received in a
1819	      UDP packet.  Upon receiving a UDP datagram, the ECN flag from the
1820	      IP header can be obtained (GET_ECN.UDP(-Lite)).

1822	   Close:  Calling Close on a UDP Connection (ABORT.UDP(-Lite)) releases
1823	      the local port reservation.

1825	   Abort:  Calling Abort on a UDP Connection (ABORT.UDP(-Lite)) is
1826	      identical to calling Close.

1828	10.4.  UDP-Lite

1830	   Connectedness: Connectionless

1832	   Data Unit: Datagram

1834	   The Transport Services API mappings for UDP-Lite are identical to
1835	   UDP.  Properties that require checksum coverage are not supported by
1836	   UDP-Lite, such as "Corruption Protection Length", "Full Checksum
1837	   Coverage on Sending", "Required Minimum Corruption Protection
1838	   Coverage for Receiving", and "Full Checksum Coverage on Receiving".

1840	10.5.  UDP Multicast Receive

1842	   Connectedness: Connectionless

1844	   Data Unit: Datagram

1846	   API mappings for Receiving Multicast UDP are as follows:

1848	   Connection Object:  Established UDP Multicast Receive connections
1849	      represent a pair of specific IP addresses and ports.  The
1850	      "unidirectional receive" transport property is required, and the
1851	      Local Endpoint must be configured with a group IP address and a
1852	      port.

1854	   Initiate:  Calling Initiate on a UDP Multicast Receive Connection
1855	      causes an immediate InitiateError.  This is an unsupported
1856	      operation.

1858	   InitiateWithSend:  Calling InitiateWithSend on a UDP Multicast
1859	      Receive Connection causes an immediate InitiateError.  This is an
1860	      unsupported operation.

1862	   Ready:  A UDP Multicast Receive Connection is ready once the system
1863	      has received traffic for the appropriate group and port.

1865	   InitiateError:  UDP Multicast Receive Connections generate an
1866	      InitiateError if Initiate is called.

1868	   ConnectionError:  Once in use, UDP throws "soft errors" (ERROR.UDP(-
1869	      Lite)) upon receiving ICMP notifications indicating failures in
1870	      the network.

1872	   Listen:  LISTEN.UDP.  Calling Listen for UDP Multicast Receive binds
1873	      a local port, prepares it to receive inbound UDP datagrams from
1874	      peers, and issues a multicast host join.  If a Remote Endpoint
1875	      with an address is supplied, the join is Source-specific
1876	      Multicast, and the path selection is based on the route to the
1877	      Remote Endpoint.  If a Remote Endpoint is not supplied, the join
1878	      is Any-source Multicast, and the path selection is based on the
1879	      outbound route to the group supplied in the Local Endpoint.

1881	   There are cases where it is required to open multiple connections for
1882	   the same address(es).  For example, one Connection might be opened
1883	   for a multicast group to for a multicast control bus, and another
1884	   application later opens a separate Connection to the same group to
1885	   send signals to and/or receive signals from the common bus.  In such
1886	   cases, the Transport Services system needs to explicitly enable re-
1887	   use of the same set of addresses (equivalent to setting SO_REUSEADDR
1888	   in the socket API).

1890	   ConnectionReceived:  UDP Multicast Receive Listeners will deliver new
1891	      connections once they have received traffic from a new Remote
1892	      Endpoint.

1894	   Clone:  Calling Clone on a UDP Multicast Receive Connection creates a
1895	      new Connection with equivalent parameters.  The two Connections
1896	      are otherwise independent.

1898	   Send:  SEND.UDP(-Lite).  Calling Send on a UDP Multicast Receive
1899	      connection causes an immediate SendError.  This is an unsupported
1900	      operation.

1902	   Receive:  RECEIVE.UDP(-Lite).  The Receive operation in a UDP
1903	      Multicast Receive connection only delivers complete Messages to
1904	      Received, each of which represents a single datagram received in a
1905	      UDP packet.  Upon receiving a UDP datagram, the ECN flag from the
1906	      IP header can be obtained (GET_ECN.UDP(-Lite)).

1908	   Close:  Calling Close on a UDP Multicast Receive Connection
1909	      (ABORT.UDP(-Lite)) releases the local port reservation and leaves
1910	      the group.

1912	   Abort:  Calling Abort on a UDP Multicast Receive Connection
1913	      (ABORT.UDP(-Lite)) is identical to calling Close.

1915	10.6.  SCTP

1917	   Connectedness: Connected

1919	   Data Unit: Message

1921	   API mappings for SCTP are as follows:

1923	   Connection Object:  Connection objects can be mapped to an SCTP
1924	      association or a stream in an SCTP association.  Mapping
1925	      Connection objects to SCTP streams is called "stream mapping" and
1926	      has additional requirements as follows.  The following explanation
1927	      assumes a client-server communication model.

1929	   Stream mapping requires an association to already be in place between
1930	   the client and the server, and it requires the server to understand
1931	   that a new incoming stream should be represented as a new Connection
1932	   Object by the Transport Services system.  A new SCTP stream is
1933	   created by sending an SCTP message with a new stream id.  Thus, to
1934	   implement stream mapping, the Transport Services API MUST provide a
1935	   newly created Connection Object to the application upon the reception
1936	   of such a message.  The necessary semantics to implement a Transport
1937	   Services system Close and Abort primitives are provided by the stream
1938	   reconfiguration (reset) procedure described in [RFC6525].  This also
1939	   allows to re-use a stream id after resetting ("closing") the stream.
1940	   To implement this functionality, SCTP stream reconfiguration
1941	   [RFC6525] MUST be supported by both the client and the server side.

1943	   To avoid head-of-line blocking, stream mapping SHOULD only be
1944	   implemented when both sides support message interleaving [RFC8260].
1945	   This allows a sender to schedule transmissions between multiple
1946	   streams without risking that transmission of a large message on one
1947	   stream might block transmissions on other streams for a long time.

1949	   To avoid conflicts between stream ids, the following procedure is
1950	   recommended: the first Connection, for which the SCTP association has
1951	   been created, MUST always use stream id zero.  All additional
1952	   Connections are assigned to unused stream ids in growing order.  To
1953	   avoid a conflict when both endpoints map new Connections
1954	   simultaneously, the peer which initiated association MUST use even
1955	   stream ids whereas the remote side MUST map its Connections to odd
1956	   stream ids.  Both sides maintain a status map of the assigned stream
1957	   ids.  Generally, new streams SHOULD consume the lowest available
1958	   (even or odd, depending on the side) stream id; this rule is relevant
1959	   when lower ids become available because Connection objects associated
1960	   with the streams are closed.

1962	   SCTP stream mapping as described here has been implemented in a
1963	   research prototype; a desription of this implementation is given in
1964	   [NEAT-flow-mapping].

1966	   Initiate:  If this is the only Connection object that is assigned to
1967	      the SCTP Association or stream mapping is not used, CONNECT.SCTP
1968	      is called.  Else, unless the Selection Property
1969	      activeReadBeforeSend is Preferred or Required, a new stream is
1970	      used: if there are enough streams available, Initiate is a local
1971	      operation that assigns a new stream id to the Connection object.
1972	      The number of streams is negotiated as a parameter of the prior
1973	      CONNECT.SCTP call, and it represents a trade-off between local
1974	      resource usage and the number of Connection objects that can be
1975	      mapped without requiring a reconfiguration signal.  When running
1976	      out of streams, ADD_STREAM.SCTP must be called.

1978	   InitiateWithSend:  If this is the only Connection object that is
1979	      assigned to the SCTP association or stream mapping is not used,
1980	      CONNECT.SCTP is called with the "user message" parameter.  Else, a
1981	      new stream is used (see Initiate for how to handle running out of
1982	      streams), and this just sends the first message on a new stream.

1984	   Ready:  Initiate or InitiateWithSend returns without an error, i.e.
1985	      SCTP's four-way handshake has completed.  If an association with
1986	      the peer already exists, stream mapping is used and enough streams
1987	      are available, a Connection Object instantly becomes Ready after
1988	      calling Initiate or InitiateWithSend.

1990	   InitiateError:  Failure of CONNECT.SCTP.

1992	   ConnectionError:  TIMEOUT.SCTP or ABORT-EVENT.SCTP.

1994	   Listen:  LISTEN.SCTP.  If an association with the peer already exists
1995	      and stream mapping is used, Listen just expects to receive a new
1996	      message with a new stream id (chosen in accordance with the stream
1997	      id assignment procedure described above).

1999	   ConnectionReceived:  LISTEN.SCTP returns without an error (a result
2000	      of successful CONNECT.SCTP from the peer), or, in case of stream
2001	      mapping, the first message has arrived on a new stream (in this
2002	      case, Receive is also invoked).

2004	   Clone:  Calling Clone on an SCTP association creates a new Connection
2005	      object and assigns it a new stream id in accordance with the
2006	      stream id assignment procedure described above.  If there are not
2007	      enough streams available, ADD_STREAM.SCTP must be called.

2009	   Priority (Connection):  When this value is changed, or a Message with
2010	      Message Property Priority is sent, and there are multiple
2011	      Connection objects assigned to the same SCTP association,
2012	      CONFIGURE_STREAM_SCHEDULER.SCTP is called to adjust the priorities
2013	      of streams in the SCTP association.

2015	   Send:  SEND.SCTP.  Message Properties such as Lifetime and Ordered
2016	      map to parameters of this primitive.

2018	   Receive:  RECEIVE.SCTP.  The "partial flag" of RECEIVE.SCTP invokes a
2019	      ReceivedPartial event.

2021	   Close: If this is the only Connection object that is assigned to the
2022	   SCTP association, CLOSE.SCTP is called, and the Closed event will be
2023	   delivered to the application upon the ensuing CLOSE-EVENT.SCTP.
2024	   Else, the Connection object is one out of several Connection objects
2025	   that are assigned to the same SCTP assocation, and RESET_STREAM.SCTP
2026	   must be called, which informs the peer that the stream will no longer
2027	   be used for mapping and can be used by future Initiate,
2028	   InitiateWithSend or Listen calls.  At the peer, the event
2029	   RESET_STREAM-EVENT.SCTP will fire, which the peer must answer by
2030	   issuing RESET_STREAM.SCTP too.  The resulting local RESET_STREAM-
2031	   EVENT.SCTP informs the Transport Services system that the stream id
2032	   can now be re-used by the next Initiate, InitiateWithSend or Listen
2033	   calls, and invokes a Closed event towards the application.

2035	   Abort: If this is the only Connection object that is assigned to the
2036	   SCTP association, ABORT.SCTP is called.  Else, the Connection object
2037	   is one out of several Connection objects that are assigned to the
2038	   same SCTP assocation, and shutdown proceeds as described under Close.

2040	11.  IANA Considerations

2042	   RFC-EDITOR: Please remove this section before publication.

2044	   This document has no actions for IANA.

2046	12.  Security Considerations

2048	   [I-D.ietf-taps-arch] outlines general security consideration and
2049	   requirements for any system that implements the Transport Services
2050	   archtecture.  [I-D.ietf-taps-interface] provides further discussion
2051	   on security and privacy implications of the Transport Services API.
2052	   This document provides additional guidance on implementation
2053	   specifics for the Transport Services API and as such the security
2054	   considerations in both of these documents apply.  The next two
2055	   subsections discuss further considerations that are specific to
2056	   mechanisms specified in this document.

2058	12.1.  Considerations for Candidate Gathering

2060	   Implementations should avoid downgrade attacks that allow network
2061	   interference to cause the implementation to select less secure, or
2062	   entirely insecure, combinations of paths and protocols.

2064	12.2.  Considerations for Candidate Racing

2066	   See Section 5.3 for security considerations around racing with 0-RTT
2067	   data.

2069	   An attacker that knows a particular device is racing several options
2070	   during connection establishment may be able to block packets for the
2071	   first connection attempt, thus inducing the device to fall back to a
2072	   secondary attempt.  This is a problem if the secondary attempts have
2073	   worse security properties that enable further attacks.
2074	   Implementations should ensure that all options have equivalent
2075	   security properties to avoid incentivizing attacks.

2077	   Since results from the network can determine how a connection attempt
2078	   tree is built, such as when DNS returns a list of resolved endpoints,
2079	   it is possible for the network to cause an implementation to consume
2080	   significant on-device resources.  Implementations should limit the
2081	   maximum amount of state allowed for any given node, including the
2082	   number of child nodes, especially when the state is based on results
2083	   from the network.

2085	13.  Acknowledgements

2087	   This work has received funding from the European Union's Horizon 2020
2088	   research and innovation programme under grant agreement No. 644334
2089	   (NEAT) and No. 815178 (5GENESIS).

2091	   This work has been supported by Leibniz Prize project funds of DFG -
2092	   German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ
2093	   FE 570/4-1).

2095	   This work has been supported by the UK Engineering and Physical
2096	   Sciences Research Council under grant EP/R04144X/1.

2098	   This work has been supported by the Research Council of Norway under
2099	   its "Toppforsk" programme through the "OCARINA" project.

2101	   Thanks to Colin Perkins, Tom Jones, Karl-Johan Grinnemo, Gorry
2102	   Fairhurst, for their contributions to the design of this
2103	   specification.  Thanks also to Stuart Cheshire, Josh Graessley, David
2104	   Schinazi, and Eric Kinnear for their implementation and design
2105	   efforts, including Happy Eyeballs, that heavily influenced this work.

2107	14.  References

2109	14.1.  Normative References

2111	   [I-D.ietf-taps-arch]
2112	              Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., and
2113	              C. Perkins, "An Architecture for Transport Services", Work
2114	              in Progress, Internet-Draft, draft-ietf-taps-arch-12, 3
2115	              January 2022, <https://datatracker.ietf.org/doc/html/
2116	              draft-ietf-taps-arch-12>.

2118	   [I-D.ietf-taps-interface]
2119	              Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G.,
2120	              Kuehlewind, M., Perkins, C., Tiesel, P. S., Wood, C. A.,
2121	              Pauly, T., and K. Rose, "An Abstract Application Layer
2122	              Interface to Transport Services", Work in Progress,
2123	              Internet-Draft, draft-ietf-taps-interface-14, 3 January
2124	              2022, <https://datatracker.ietf.org/doc/html/draft-ietf-
2125	              taps-interface-14>.

2127	   [RFC7413]  Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
2128	              Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
2129	              <https://www.rfc-editor.org/rfc/rfc7413>.

2131	   [RFC7540]  Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
2132	              Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
2133	              DOI 10.17487/RFC7540, May 2015,
2134	              <https://www.rfc-editor.org/rfc/rfc7540>.

2136	   [RFC8303]  Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of
2137	              Transport Features Provided by IETF Transport Protocols",
2138	              RFC 8303, DOI 10.17487/RFC8303, February 2018,
2139	              <https://www.rfc-editor.org/rfc/rfc8303>.

2141	   [RFC8304]  Fairhurst, G. and T. Jones, "Transport Features of the
2142	              User Datagram Protocol (UDP) and Lightweight UDP (UDP-
2143	              Lite)", RFC 8304, DOI 10.17487/RFC8304, February 2018,
2144	              <https://www.rfc-editor.org/rfc/rfc8304>.

2146	   [RFC8305]  Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2:
2147	              Better Connectivity Using Concurrency", RFC 8305,
2148	              DOI 10.17487/RFC8305, December 2017,
2149	              <https://www.rfc-editor.org/rfc/rfc8305>.

2151	   [RFC8421]  Martinsen, P., Reddy, T., and P. Patil, "Guidelines for
2152	              Multihomed and IPv4/IPv6 Dual-Stack Interactive
2153	              Connectivity Establishment (ICE)", BCP 217, RFC 8421,
2154	              DOI 10.17487/RFC8421, July 2018,
2155	              <https://www.rfc-editor.org/rfc/rfc8421>.

2157	   [RFC8446]  Rescorla, E., "The Transport Layer Security (TLS) Protocol
2158	              Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
2159	              <https://www.rfc-editor.org/rfc/rfc8446>.

2161	   [RFC8923]  Welzl, M. and S. Gjessing, "A Minimal Set of Transport
2162	              Services for End Systems", RFC 8923, DOI 10.17487/RFC8923,
2163	              October 2020, <https://www.rfc-editor.org/rfc/rfc8923>.

2165	14.2.  Informative References

2167	   [I-D.ietf-quic-transport]
2168	              Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
2169	              and Secure Transport", Work in Progress, Internet-Draft,
2170	              draft-ietf-quic-transport-34, 14 January 2021,
2171	              <https://datatracker.ietf.org/doc/html/draft-ietf-quic-
2172	              transport-34>.

2174	   [I-D.ietf-tcpm-2140bis]
2175	              Touch, J., Welzl, M., and S. Islam, "TCP Control Block
2176	              Interdependence", Work in Progress, Internet-Draft, draft-
2177	              ietf-tcpm-2140bis-11, 12 April 2021,
2178	              <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-
2179	              2140bis-11>.

2181	   [NEAT-flow-mapping]
2182	              "Transparent Flow Mapping for NEAT", IFIP NETWORKING 2017
2183	              Workshop on Future of Internet Transport (FIT 2017) ,
2184	              2017.

2186	   [RFC1928]  Leech, M., Ganis, M., Lee, Y., Kuris, R., Koblas, D., and
2187	              L. Jones, "SOCKS Protocol Version 5", RFC 1928,
2188	              DOI 10.17487/RFC1928, March 1996,
2189	              <https://www.rfc-editor.org/rfc/rfc1928>.

2191	   [RFC3124]  Balakrishnan, H. and S. Seshan, "The Congestion Manager",
2192	              RFC 3124, DOI 10.17487/RFC3124, June 2001,
2193	              <https://www.rfc-editor.org/rfc/rfc3124>.

2195	   [RFC3207]  Hoffman, P., "SMTP Service Extension for Secure SMTP over
2196	              Transport Layer Security", RFC 3207, DOI 10.17487/RFC3207,
2197	              February 2002, <https://www.rfc-editor.org/rfc/rfc3207>.

2199	   [RFC5389]  Rosenberg, J., Mahy, R., Matthews, P., and D. Wing,
2200	              "Session Traversal Utilities for NAT (STUN)", RFC 5389,
2201	              DOI 10.17487/RFC5389, October 2008,
2202	              <https://www.rfc-editor.org/rfc/rfc5389>.

2204	   [RFC5766]  Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using
2205	              Relays around NAT (TURN): Relay Extensions to Session
2206	              Traversal Utilities for NAT (STUN)", RFC 5766,
2207	              DOI 10.17487/RFC5766, April 2010,
2208	              <https://www.rfc-editor.org/rfc/rfc5766>.

2210	   [RFC6525]  Stewart, R., Tuexen, M., and P. Lei, "Stream Control
2211	              Transmission Protocol (SCTP) Stream Reconfiguration",
2212	              RFC 6525, DOI 10.17487/RFC6525, February 2012,
2213	              <https://www.rfc-editor.org/rfc/rfc6525>.

2215	   [RFC6762]  Cheshire, S. and M. Krochmal, "Multicast DNS", RFC 6762,
2216	              DOI 10.17487/RFC6762, February 2013,
2217	              <https://www.rfc-editor.org/rfc/rfc6762>.

2219	   [RFC6763]  Cheshire, S. and M. Krochmal, "DNS-Based Service
2220	              Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013,
2221	              <https://www.rfc-editor.org/rfc/rfc6763>.

2223	   [RFC7230]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
2224	              Protocol (HTTP/1.1): Message Syntax and Routing",
2225	              RFC 7230, DOI 10.17487/RFC7230, June 2014,
2226	              <https://www.rfc-editor.org/rfc/rfc7230>.

2228	   [RFC7657]  Black, D., Ed. and P. Jones, "Differentiated Services
2229	              (Diffserv) and Real-Time Communication", RFC 7657,
2230	              DOI 10.17487/RFC7657, November 2015,
2231	              <https://www.rfc-editor.org/rfc/rfc7657>.

2233	   [RFC8085]  Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
2234	              Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
2235	              March 2017, <https://www.rfc-editor.org/rfc/rfc8085>.

2237	   [RFC8260]  Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann,
2238	              "Stream Schedulers and User Message Interleaving for the
2239	              Stream Control Transmission Protocol", RFC 8260,
2240	              DOI 10.17487/RFC8260, November 2017,
2241	              <https://www.rfc-editor.org/rfc/rfc8260>.

2243	   [RFC8445]  Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive
2244	              Connectivity Establishment (ICE): A Protocol for Network
2245	              Address Translator (NAT) Traversal", RFC 8445,
2246	              DOI 10.17487/RFC8445, July 2018,
2247	              <https://www.rfc-editor.org/rfc/rfc8445>.

2249	   [TCP-COUPLING]
2250	              "ctrlTCP: Reducing Latency through Coupled, Heterogeneous
2251	              Multi-Flow TCP Congestion Control", IEEE INFOCOM Global
2252	              Internet Symposium (GI) workshop (GI 2018) , n.d..

2254	Appendix A.  API Mapping Template

2256	   Any protocol mapping for the Transport Services API should follow a
2257	   common template.

2259	   Connectedness: (Connectionless/Connected/Multiplexing Connected)

2261	   Data Unit: (Byte-stream/Datagram/Message)

2263	   Connection Object:

2265	   Initiate:

2267	   InitiateWithSend:

2269	   Ready:

2271	   InitiateError:

2273	   ConnectionError:

2275	   Listen:

2277	   ConnectionReceived:

2279	   Clone:

2281	   Send:

2283	   Receive:

2285	   Close:

2287	   Abort:

2289	Appendix B.  Additional Properties

2291	   This appendix discusses implementation considerations for additional
2292	   parameters and properties that could be used to enhance transport
2293	   protocol and/or path selection, or the transmission of messages given
2294	   a Protocol Stack that implements them.  These are not part of the
2295	   interface, and may be removed from the final document, but are
2296	   presented here to support discussion within the TAPS working group as
2297	   to whether they should be added to a future revision of the base
2298	   specification.

2300	B.1.  Properties Affecting Sorting of Branches

2302	   In addition to the Protocol and Path Selection Properties discussed
2303	   in Section 4.1.3, the following properties under discussion can
2304	   influence branch sorting:

2306	   *  Bounds on Send or Receive Rate: If the application indicates a
2307	      bound on the expected Send or Receive bitrate, an implementation
2308	      may prefer a path that can likely provide the desired bandwidth,
2309	      based on cached maximum throughput, see Section 9.2.  The
2310	      application may know the Send or Receive Bitrate from metadata in
2311	      adaptive HTTP streaming, such as MPEG-DASH.

2313	   *  Cost Preferences: If the application indicates a preference to
2314	      avoid expensive paths, and some paths are associated with a
2315	      monetary cost, an implementation should decrease the ranking of
2316	      such paths.  If the application indicates that it prohibits using
2317	      expensive paths, paths that are associated with a cost should be
2318	      purged from the decision tree.

2320	Appendix C.  Reasons for errors

2322	   The Transport Services API [I-D.ietf-taps-interface] allows for the
2323	   several generic error types to specify a more detailed reason as to
2324	   why an error occurred.  This appendix lists some of the possible
2325	   reasons.

2327	   *  InvalidConfiguration: The transport properties and endpoints
2328	      provided by the application are either contradictory or
2329	      incomplete.  Examples include the lack of a Remote Endpoint on an
2330	      active open or using a multicast group address while not
2331	      requesting a unidirectional receive.

2333	   *  NoCandidates: The configuration is valid, but none of the
2334	      available transport protocols can satisfy the transport properties
2335	      provided by the application.

2337	   *  ResolutionFailed: The remote or local specifier provided by the
2338	      application can not be resolved.

2340	   *  EstablishmentFailed: The Transport Services system was unable to
2341	      establish a transport-layer connection to the Remote Endpoint
2342	      specified by the application.

2344	   *  PolicyProhibited: The system policy prevents the transport system
2345	      from performing the action requested by the application.

2347	   *  NotCloneable: The protocol stack is not capable of being cloned.

2349	   *  MessageTooLarge: The message size is too big for the transport
2350	      system to handle.

2352	   *  ProtocolFailed: The underlying protocol stack failed.

2354	   *  InvalidMessageProperties: The message properties are either
2355	      contradictory to the transport properties or they can not be
2356	      satisfied by the transport system.

2358	   *  DeframingFailed: The data that was received by the underlying
2359	      protocol stack could not be deframed.

2361	   *  ConnectionAborted: The connection was aborted by the peer.

2363	   *  Timeout: Delivery of a message was not possible after a timeout.

2365	Appendix D.  Existing Implementations

2367	   This appendix gives an overview of existing implementations, at the
2368	   time of writing, of transport systems that are (to some degree) in
2369	   line with this document.

2371	   *  Apple's Network.framework:

2373	      -  Network.framework is a transport-level API built for C,
2374	         Objective-C, and Swift.  It a connect-by-name API that supports
2375	         transport security protocols.  It provides userspace
2376	         implementations of TCP, UDP, TLS, DTLS, proxy protocols, and
2377	         allows extension via custom framers.

2379	      -  Documentation: https://developer.apple.com/documentation/
2380	         network (https://developer.apple.com/documentation/network)

2382	   *  NEAT and NEATPy:

2384	      -  NEAT is the output of the European H2020 research project
2385	         "NEAT"; it is a user-space library for protocol-independent
2386	         communication on top of TCP, UDP and SCTP, with many more
2387	         features such as a policy manager.

2389	      -  Code: https://github.com/NEAT-project/neat (https://github.com/
2390	         NEAT-project/neat)

2392	      -  NEAT project: https://www.neat-project.org (https://www.neat-
2393	         project.org)

2395	      -  NEATPy is a Python shim over NEAT which updates the NEAT API to
2396	         be in line with version 6 of the Transport Services API draft.

2398	      -  Code: https://github.com/theagilepadawan/NEATPy
2399	         (https://github.com/theagilepadawan/NEATPy)

2401	   *  PyTAPS:

2403	      -  A TAPS implementation based on Python asyncio, offering
2404	         protocol-independent communication to applications on top of
2405	         TCP, UDP and TLS, with support for multicast.

2407	      -  Code: https://github.com/fg-inet/python-asyncio-taps
2408	         (https://github.com/fg-inet/python-asyncio-taps)

2410	Authors' Addresses
2411	   Anna Brunstrom (editor)
2412	   Karlstad University
2413	   Universitetsgatan 2
2414	   651 88 Karlstad
2415	   Sweden
2416	   Email: anna.brunstrom@kau.se

2418	   Tommy Pauly (editor)
2419	   Apple Inc.
2420	   One Apple Park Way
2421	   Cupertino, California 95014,
2422	   United States of America
2423	   Email: tpauly@apple.com

2425	   Theresa Enghardt
2426	   Netflix
2427	   121 Albright Way
2428	   Los Gatos, CA 95032,
2429	   United States of America
2430	   Email: ietf@tenghardt.net

2432	   Philipp S. Tiesel
2433	   SAP SE
2434	   Konrad-Zuse-Ring 10
2435	   14469 Potsdam
2436	   Germany
2437	   Email: philipp@tiesel.net

2439	   Michael Welzl
2440	   University of Oslo
2441	   PO Box 1080 Blindern
2442	   0316  Oslo
2443	   Norway
2444	   Email: michawe@ifi.uio.no