idnits 2.17.1 draft-ietf-taps-impl-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 7 instances of too long lines in the document, the longest one being 41 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 1320: '... Implementations SHOULD ensure that th...' RFC 2119 keyword, line 1934: '... Transport Services API MUST provide a...' RFC 2119 keyword, line 1941: '... [RFC6525] MUST be supported by both the client and the server side....' RFC 2119 keyword, line 1943: '...locking, stream mapping SHOULD only be...' RFC 2119 keyword, line 1951: '... been created, MUST always use strea...' (3 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (7 March 2022) is 782 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-19) exists of draft-ietf-taps-arch-12 == Outdated reference: A later version (-26) exists of draft-ietf-taps-interface-14 ** Obsolete normative reference: RFC 7540 (Obsoleted by RFC 9113) -- Obsolete informational reference (is this intentional?): RFC 5389 (Obsoleted by RFC 8489) -- Obsolete informational reference (is this intentional?): RFC 5766 (Obsoleted by RFC 8656) -- Obsolete informational reference (is this intentional?): RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TAPS Working Group A. Brunstrom, Ed. 3 Internet-Draft Karlstad University 4 Intended status: Informational T. Pauly, Ed. 5 Expires: 8 September 2022 Apple Inc. 6 T. Enghardt 7 Netflix 8 P. Tiesel 9 SAP SE 10 M. Welzl 11 University of Oslo 12 7 March 2022 14 Implementing Interfaces to Transport Services 15 draft-ietf-taps-impl-12 17 Abstract 19 The Transport Services system enables applications to use transport 20 protocols flexibly for network communication and defines a protocol- 21 independent Transport Services Application Programming Interface 22 (API) that is based on an asynchronous, event-driven interaction 23 pattern. This document serves as a guide to implementation on how to 24 build such a system. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on 8 September 2022. 43 Copyright Notice 45 Copyright (c) 2022 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 50 license-info) in effect on the date of publication of this document. 51 Please review these documents carefully, as they describe your rights 52 and restrictions with respect to this document. Code Components 53 extracted from this document must include Revised BSD License text as 54 described in Section 4.e of the Trust Legal Provisions and are 55 provided without warranty as described in the Revised BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Implementing Connection Objects . . . . . . . . . . . . . . . 4 61 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 5 62 3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 63 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 6 64 4. Implementing Connection Establishment . . . . . . . . . . . . 7 65 4.1. Structuring Candidates as a Tree . . . . . . . . . . . . 8 66 4.1.1. Branch Types . . . . . . . . . . . . . . . . . . . . 10 67 4.1.2. Branching Order-of-Operations . . . . . . . . . . . . 12 68 4.1.3. Sorting Branches . . . . . . . . . . . . . . . . . . 14 69 4.2. Candidate Gathering . . . . . . . . . . . . . . . . . . . 15 70 4.2.1. Gathering Endpoint Candidates . . . . . . . . . . . . 15 71 4.3. Candidate Racing . . . . . . . . . . . . . . . . . . . . 17 72 4.3.1. Simultaneous . . . . . . . . . . . . . . . . . . . . 17 73 4.3.2. Staggered . . . . . . . . . . . . . . . . . . . . . . 18 74 4.3.3. Failover . . . . . . . . . . . . . . . . . . . . . . 19 75 4.4. Completing Establishment . . . . . . . . . . . . . . . . 19 76 4.4.1. Determining Successful Establishment . . . . . . . . 20 77 4.5. Establishing multiplexed connections . . . . . . . . . . 21 78 4.6. Handling connectionless protocols . . . . . . . . . . . . 21 79 4.7. Implementing listeners . . . . . . . . . . . . . . . . . 21 80 4.7.1. Implementing listeners for Connected Protocols . . . 22 81 4.7.2. Implementing listeners for Connectionless 82 Protocols . . . . . . . . . . . . . . . . . . . . . . 22 83 4.7.3. Implementing listeners for Multiplexed Protocols . . 22 84 5. Implementing Sending and Receiving Data . . . . . . . . . . . 23 85 5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 23 86 5.1.1. Message Properties . . . . . . . . . . . . . . . . . 23 87 5.1.2. Send Completion . . . . . . . . . . . . . . . . . . . 25 88 5.1.3. Batching Sends . . . . . . . . . . . . . . . . . . . 25 89 5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 25 90 5.3. Handling of data for fast-open protocols . . . . . . . . 26 91 6. Implementing Message Framers . . . . . . . . . . . . . . . . 27 92 6.1. Defining Message Framers . . . . . . . . . . . . . . . . 28 93 6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 29 94 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 30 95 7. Implementing Connection Management . . . . . . . . . . . . . 31 96 7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 31 97 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 32 98 8. Implementing Connection Termination . . . . . . . . . . . . . 33 99 9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 34 100 9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 34 101 9.2. Performance caches . . . . . . . . . . . . . . . . . . . 35 102 10. Specific Transport Protocol Considerations . . . . . . . . . 36 103 10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 37 104 10.2. MPTCP . . . . . . . . . . . . . . . . . . . . . . . . . 39 105 10.3. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 39 106 10.4. UDP-Lite . . . . . . . . . . . . . . . . . . . . . . . . 40 107 10.5. UDP Multicast Receive . . . . . . . . . . . . . . . . . 40 108 10.6. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 42 109 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45 110 12. Security Considerations . . . . . . . . . . . . . . . . . . . 45 111 12.1. Considerations for Candidate Gathering . . . . . . . . . 45 112 12.2. Considerations for Candidate Racing . . . . . . . . . . 45 113 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 46 114 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 46 115 14.1. Normative References . . . . . . . . . . . . . . . . . . 46 116 14.2. Informative References . . . . . . . . . . . . . . . . . 47 117 Appendix A. API Mapping Template . . . . . . . . . . . . . . . . 49 118 Appendix B. Additional Properties . . . . . . . . . . . . . . . 50 119 B.1. Properties Affecting Sorting of Branches . . . . . . . . 50 120 Appendix C. Reasons for errors . . . . . . . . . . . . . . . . . 51 121 Appendix D. Existing Implementations . . . . . . . . . . . . . . 52 122 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 52 124 1. Introduction 126 The Transport Services architecture [I-D.ietf-taps-arch] defines a 127 system that allows applications to flexibly use transport networking 128 protocols. The API that such a system exposes to applications is 129 defined as the Transport Services API [I-D.ietf-taps-interface]. 130 This API is designed to be generic across multiple transport 131 protocols and sets of protocols features. 133 This document serves as a guide to implementation on how to build a 134 system that provides a Transport Services API. It is the job of an 135 implementation of a Transport Services system to turn the requests of 136 an application into decisions on how to establish connections, and 137 how to transfer data over those connections once established. The 138 terminology used in this document is based on the Architecture 139 [I-D.ietf-taps-arch]. 141 2. Implementing Connection Objects 143 The connection objects that are exposed to applications for Transport 144 Services are: 146 * the Preconnection, the bundle of Properties that describes the 147 application constraints on, and preferences for, the transport; 149 * the Connection, the basic object that represents a flow of data as 150 Messages in either direction between the Local and Remote 151 Endpoints; 153 * and the Listener, a passive waiting object that delivers new 154 Connections. 156 Preconnection objects should be implemented as bundles of properties 157 that an application can both read and write. A Preconnection object 158 influences a Connection only at one point in time: when the 159 Connection is created. Connection objects represent the interface 160 between the application and the implementation to manage transport 161 state, and conduct data transfer. During the process of 162 establishment (Section 4), the Connection will not be bound to a 163 specific transport protocol instance, since multiple candidate 164 Protocol Stacks might be raced. 166 Once a Preconnection has been used to create an outbound Connection 167 or a Listener, the implementation should ensure that the copy of the 168 properties held by the Connection or Listener cannot be mutated by 169 the application making changes to the original Preconnection object. 170 This may involve the implementation performing a deep-copy, copying 171 the object with all the objects that it references. 173 Once the Connection is established, Transport Services implementation 174 maps actions and events to the details of the chosen Protocol Stack. 175 For example, the same Connection object may ultimately represent a 176 single instance of one transport protocol (e.g., a TCP connection, a 177 TLS session over TCP, a UDP flow with fully-specified Local and 178 Remote Endpoints, a DTLS session, a SCTP stream, a QUIC stream, or an 179 HTTP/2 stream). The properties held by a Connection or Listener is 180 independent of other connections that are not part of the same 181 Connection Group. 183 Connection establishment is only a local operation for a Datagram 184 transport (e.g., UDP(-Lite)), which serves to simplify the local 185 send/receive functions and to filter the traffic for the specified 186 addresses and ports [RFC8085]. 188 Once Initiate has been called, the Selection Properties and Endpoint 189 information are immutable (i.e, an application is not able to later 190 modify Selection Properties on the original Preconnection object). 191 Listener objects are created with a Preconnection, at which point 192 their configuration should be considered immutable by the 193 implementation. The process of listening is described in 194 Section 4.7. 196 3. Implementing Pre-Establishment 198 During pre-establishment the application specifies one or more 199 Endpoints to be used for communication as well as protocol 200 preferences and constraints via Selection Properties and, if desired, 201 also Connection Properties. Generally, Connection Properties should 202 be configured as early as possible, because they can serve as input 203 to decisions that are made by the implementation (e.g., the Capacity 204 Profile can guide usage of a protocol offering scavenger-type 205 congestion control). 207 The implementation stores these properties as a part of the 208 Preconnection object for use during connection establishment. For 209 Selection Properties that are not provided by the application, the 210 implementation must use the default values specified in the Transport 211 Services API ([I-D.ietf-taps-interface]). 213 3.1. Configuration-time errors 215 The Transport Services system should have a list of supported 216 protocols available, which each have transport features reflecting 217 the capabilities of the protocol. Once an application specifies its 218 Transport Properties, the transport system matches the required and 219 prohibited properties against the transport features of the available 220 protocols. 222 In the following cases, failure should be detected during pre- 223 establishment: 225 * A request by an application for Protocol Properties that cannot be 226 satisfied by any of the available protocols. For example, if an 227 application requires "Configure Reliability per Message", but no 228 such feature is available in any protocol the host running the 229 transport system on the host running the transport system this 230 should result in an error, e.g., when SCTP is not supported by the 231 operating system. 233 * A request by an application for Protocol Properties that are in 234 conflict with each other, i.e., the required and prohibited 235 properties cannot be satisfied by the same protocol. For example, 236 if an application prohibits "Reliable Data Transfer" but then 237 requires "Configure Reliability per Message", this mismatch should 238 result in an error. 240 To avoid allocating resources that are not finally needed, it is 241 important that configuration-time errors fail as early as possible. 243 3.2. Role of system policy 245 The properties specified during pre-establishment have a close 246 relationship to system policy. The implementation is responsible for 247 combining and reconciling several different sources of preferences 248 when establishing Connections. These include, but are not limited 249 to: 251 1. Application preferences, i.e., preferences specified during the 252 pre-establishment via Selection Properties. 254 2. Dynamic system policy, i.e., policy compiled from internally and 255 externally acquired information about available network 256 interfaces, supported transport protocols, and current/previous 257 Connections. Examples of ways to externally retrieve policy- 258 support information are through OS-specific statistics/ 259 measurement tools and tools that reside on middleboxes and 260 routers. 262 3. Default implementation policy, i.e., predefined policy by OS or 263 application. 265 In general, any protocol or path used for a connection must conform 266 to all three sources of constraints. A violation that occurs at any 267 of the policy layers should cause a protocol or path to be considered 268 ineligible for use. For an example of application preferences 269 leading to constraints, an application may prohibit the use of 270 metered network interfaces for a given Connection to avoid user cost. 271 Similarly, the system policy at a given time may prohibit the use of 272 such a metered network interface from the application's process. 273 Lastly, the implementation itself may default to disallowing certain 274 network interfaces unless explicitly requested by the application and 275 allowed by the system. 277 It is expected that the database of system policies and the method of 278 looking up these policies will vary across various platforms. An 279 implementation should attempt to look up the relevant policies for 280 the system in a dynamic way to make sure it is reflecting an accurate 281 version of the system policy, since the system's policy regarding the 282 application's traffic may change over time due to user or 283 administrative changes. 285 4. Implementing Connection Establishment 287 The process of establishing a network connection begins when an 288 application expresses intent to communicate with a Remote Endpoint by 289 calling Initiate. (At this point, any constraints or requirements 290 the application may have on the connection are available from pre- 291 establishment.) The process can be considered complete once there is 292 at least one Protocol Stack that has completed any required setup to 293 the point that it can transmit and receive the application's data. 295 Connection establishment is divided into two top-level steps: 296 Candidate Gathering, to identify the paths, protocols, and endpoints 297 to use, and Candidate Racing (see Section 4.2.2 of 298 [I-D.ietf-taps-arch]), in which the necessary protocol handshakes are 299 conducted so that the transport system can select which set to use. 301 This document structures the candidates for racing as a tree as 302 terminological convention. While a a tree structure is not the only 303 way in which racing can be implemented, it does ease the illustration 304 of how racing works. 306 The most simple example of this process might involve identifying the 307 single IP address to which the implementation wishes to connect, 308 using the system's current default path (i.e., using the default 309 interface), and starting a TCP handshake to establish a stream to the 310 specified IP address. However, each step may also differ depending 311 on the requirements of the connection: if the endpoint is defined as 312 a hostname and port, then there may be multiple resolved addresses 313 that are available; there may also be multiple paths available, (in 314 this case using an interface other than the default system 315 interface); and some protocols may not need any transport handshake 316 to be considered "established" (such as UDP), while other connections 317 may utilize layered protocol handshakes, such as TLS over TCP. 319 Whenever an implementation has multiple options for connection 320 establishment, it can view the set of all individual connection 321 establishment options as a single, aggregate connection 322 establishment. The aggregate set conceptually includes every valid 323 combination of endpoints, paths, and protocols. As an example, 324 consider an implementation that initiates a TCP connection to a 325 hostname + port endpoint, and has two valid interfaces available (Wi- 326 Fi and LTE). The hostname resolves to a single IPv4 address on the 327 Wi-Fi network, and resolves to the same IPv4 address on the LTE 328 network, as well as a single IPv6 address. The aggregate set of 329 connection establishment options can be viewed as follows: 331 Aggregate [Endpoint: www.example.com:80] [Interface: Any] [Protocol: TCP] 332 |-> [Endpoint: 192.0.2.1:80] [Interface: Wi-Fi] [Protocol: TCP] 333 |-> [Endpoint: 192.0.2.1:80] [Interface: LTE] [Protocol: TCP] 334 |-> [Endpoint: 2001:DB8::1.80] [Interface: LTE] [Protocol: TCP] 336 Any one of these sub-entries on the aggregate connection attempt 337 would satisfy the original application intent. The concern of this 338 section is the algorithm defining which of these options to try, 339 when, and in what order. 341 During Candidate Gathering, an implementation first excludes all 342 protocols and paths that match a Prohibit or do not match all Require 343 properties. Then, the implementation will sort branches according to 344 Preferred properties, Avoided properties, and possibly other 345 criteria. 347 4.1. Structuring Candidates as a Tree 349 As noted above, the considereration of multiple candidates in a 350 gathering and racing process can be conceptually structured as a 351 tree; this terminological convention is used throughout this 352 document. 354 Each leaf node of the tree represents a single, coherent connection 355 attempt, with an endpoint, a network path, and a set of protocols 356 that can directly negotiate and send data on the network. Each node 357 in the tree that is not a leaf represents a connection attempt that 358 is either underspecified, or else includes multiple distinct options. 359 For example, when connecting on an IP network, a connection attempt 360 to a hostname and port is underspecified, because the connection 361 attempt requires a resolved IP address as its Remote Endpoint. In 362 this case, the node represented by the connection attempt to the 363 hostname is a parent node, with child nodes for each IP address. 364 Similarly, an implementation that is allowed to connect using 365 multiple interfaces will have a parent node of the tree for the 366 decision between the network paths, with a branch for each interface. 368 The example aggregate connection attempt above can be drawn as a tree 369 by grouping the addresses resolved on the same interface into 370 branches: 372 || 373 +==========================+ 374 | www.example.com:80/Any | 375 +==========================+ 376 // \\ 377 +==========================+ +==========================+ 378 | www.example.com:80/Wi-Fi | | www.example.com:80/LTE | 379 +==========================+ +==========================+ 380 || // \\ 381 +====================+ +====================+ +======================+ 382 | 192.0.2.1:80/Wi-Fi | | 192.0.2.1:80/LTE | | 2001:DB8::1.80/LTE | 383 +====================+ +====================+ +======================+ 385 The rest of this section will use a notation scheme to represent this 386 tree. The parent (or trunk) node of the tree will be represented by 387 a single integer, such as "1". Each child of that node will have an 388 integer that identifies it, from 1 to the number of children. That 389 child node will be uniquely identified by concatenating its integer 390 to it's parents identifier with a dot in between, such as "1.1" and 391 "1.2". Each node will be summarized by a tuple of three elements: 392 endpoint, path (labeled here by interface), and protocol. The above 393 example can now be written more succinctly as: 395 1 [www.example.com:80, Any, TCP] 396 1.1 [www.example.com:80, Wi-Fi, TCP] 397 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 398 1.2 [www.example.com:80, LTE, TCP] 399 1.2.1 [192.0.2.1:80, LTE, TCP] 400 1.2.2 [2001:DB8::1.80, LTE, TCP] 402 When an implementation views this aggregate set of connection 403 attempts as a single connection establishment, it only will use one 404 of the leaf nodes to transfer data. Thus, when a single leaf node 405 becomes ready to use, then the entire connection attempt is ready to 406 use by the application. Another way to represent this is that every 407 leaf node updates the state of its parent node when it becomes ready, 408 until the trunk node of the tree is ready, which then notifies the 409 application that the connection as a whole is ready to use. 411 A connection establishment tree may be degenerate, and only have a 412 single leaf node, such as a connection attempt to an IP address over 413 a single interface with a single protocol. 415 1 [192.0.2.1:80, Wi-Fi, TCP] 417 A parent node may also only have one child (or leaf) node, such as a 418 when a hostname resolves to only a single IP address. 420 1 [www.example.com:80, Wi-Fi, TCP] 421 1.1 [192.0.2.1:80, Wi-Fi, TCP] 423 4.1.1. Branch Types 425 There are three types of branching from a parent node into one or 426 more child nodes. Any parent node of the tree must only use one type 427 of branching. 429 4.1.1.1. Derived Endpoints 431 If a connection originally targets a single endpoint, there may be 432 multiple endpoints of different types that can be derived from the 433 original. This creates an ordered list of the derived endpoints 434 according to application preference, system policy and expected 435 performance. 437 DNS hostname-to-address resolution is the most common method of 438 endpoint derivation. When trying to connect to a hostname endpoint 439 on a traditional IP network, the implementation should send DNS 440 queries for both A (IPv4) and AAAA (IPv6) records if both are 441 supported on the local interface. The algorithm for ordering and 442 racing these addresses should follow the recommendations in Happy 443 Eyeballs [RFC8305]. 445 1 [www.example.com:80, Wi-Fi, TCP] 446 1.1 [2001:DB8::1.80, Wi-Fi, TCP] 447 1.2 [192.0.2.1:80, Wi-Fi, TCP] 448 1.3 [2001:DB8::2.80, Wi-Fi, TCP] 449 1.4 [2001:DB8::3.80, Wi-Fi, TCP] 451 DNS-Based Service Discovery [RFC6763] can also provide an endpoint 452 derivation step. When trying to connect to a named service, the 453 client may discover one or more hostname and port pairs on the local 454 network using multicast DNS [RFC6762]. These hostnames should each 455 be treated as a branch that can be attempted independently from other 456 hostnames. Each of these hostnames might resolve to one or more 457 addresses, which would create multiple layers of branching. 459 1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP] 460 1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP] 461 1.1.1 [31.133.160.18.631, Wi-Fi, TCP] 463 Applications can influence which derived endpoints are allowed and 464 preferred via Selection Properties set on the Preconnection. For 465 example, setting a preference for useTemporaryLocalAddress would 466 prefer the use of IPv6 over IPv4, and requiring 467 useTemporaryLocalAddress would eliminate IPv4 options, since IPv4 468 does not support temporary addresses. 470 4.1.1.2. Alternate Paths 472 If a client has multiple network paths available to it, e.g., a 473 mobile client with intefaces for both Wi-Fi and Cellular 474 connectivity, it can attempt a connection over any of the paths. 475 This represents a branch point in the connection establishment. 476 Similar to a derived endpoint, the paths should be ranked based on 477 preference, system policy, and performance. Attempts should be 478 started on one path (e.g., a specific interface), and then 479 successively on other paths (or interfaces) after delays based on 480 expected path round-trip-time or other available metrics. 482 1 [192.0.2.1:80, Any, TCP] 483 1.1 [192.0.2.1:80, Wi-Fi, TCP] 484 1.2 [192.0.2.1:80, LTE, TCP] 486 This same approach applies to any situation in which the client is 487 aware of multiple links or views of the network. A single interface 488 may be shared by multiple network paths, each with a coherent set of 489 addresses, routes, DNS server, and more. A path may also represent a 490 virtual interface service such as a Virtual Private Network (VPN). 492 The list of available paths should be constrained by any requirements 493 the application sets, as well as by the system policy. 495 4.1.1.3. Protocol Options 497 Differences in possible protocol compositions and options can also 498 provide a branching point in connection establishment. This allows 499 clients to be resilient to situations in which a certain protocol is 500 not functioning on a server or network. 502 This approach is commonly used for connections with optional proxy 503 server configurations. A single connection might have several 504 options available: an HTTP-based proxy, a SOCKS-based proxy, or no 505 proxy. These options should be ranked and attempted in succession. 507 1 [www.example.com:80, Any, HTTP/TCP] 508 1.1 [192.0.2.8:80, Any, HTTP/HTTP Proxy/TCP] 509 1.2 [192.0.2.7:10234, Any, HTTP/SOCKS/TCP] 510 1.3 [www.example.com:80, Any, HTTP/TCP] 511 1.3.1 [192.0.2.1:80, Any, HTTP/TCP] 513 This approach also allows a client to attempt different sets of 514 application and transport protocols that, when available, could 515 provide preferable features. For example, the protocol options could 516 involve QUIC [I-D.ietf-quic-transport] over UDP on one branch, and 517 HTTP/2 [RFC7540] over TLS over TCP on the other: 519 1 [www.example.com:443, Any, Any HTTP] 520 1.1 [www.example.com:443, Any, QUIC/UDP] 521 1.1.1 [192.0.2.1:443, Any, QUIC/UDP] 522 1.2 [www.example.com:443, Any, HTTP2/TLS/TCP] 523 1.2.1 [192.0.2.1:443, Any, HTTP2/TLS/TCP] 525 Another example is racing SCTP with TCP: 527 1 [www.example.com:80, Any, Any Stream] 528 1.1 [www.example.com:80, Any, SCTP] 529 1.1.1 [192.0.2.1:80, Any, SCTP] 530 1.2 [www.example.com:80, Any, TCP] 531 1.2.1 [192.0.2.1:80, Any, TCP] 533 Implementations that support racing protocols and protocol options 534 should maintain a history of which protocols and protocol options 535 successfully established, on a per-network and per-endpoint basis 536 (see Section 9.2). This information can influence future racing 537 decisions to prioritize or prune branches. 539 4.1.2. Branching Order-of-Operations 541 Branch types must occur in a specific order relative to one another 542 to avoid creating leaf nodes with invalid or incompatible settings. 543 In the example above, it would be invalid to branch for derived 544 endpoints (the DNS results for www.example.com) before branching 545 between interface paths, since there are situations when the results 546 will be different across networks due to private names or different 547 supported IP versions. Implementations must be careful to branch in 548 an order that results in usable leaf nodes whenever there are 549 multiple branch types that could be used from a single node. 551 The order of operations for branching should be: 553 1. Alternate Paths 554 2. Protocol Options 556 3. Derived Endpoints 558 where a lower number indicates higher precedence and therefore higher 559 placement in the tree. Branching between paths is the first in the 560 list because results across multiple interfaces are likely not 561 related to one another: endpoint resolution may return different 562 results, especially when using locally resolved host and service 563 names, and which protocols are supported and preferred may differ 564 across interfaces. Thus, if multiple paths are attempted, the 565 overall connection can be seen as a race between the available paths 566 or interfaces. 568 Protocol options are next checked in order. Whether or not a set of 569 protocol, or protocol-specific options, can successfully connect is 570 generally not dependent on which specific IP address is used. 571 Furthermore, the protocol stacks being attempted may influence or 572 altogether change the endpoints being used. Adding a proxy to a 573 connection's branch will change the endpoint to the proxy's IP 574 address or hostname. Choosing an alternate protocol may also modify 575 the ports that should be selected. 577 Branching for derived endpoints is the final step, and may have 578 multiple layers of derivation or resolution, such as DNS service 579 resolution and DNS hostname resolution. 581 For example, if the application has indicated both a preference for 582 WiFi over LTE and for a feature only available in SCTP, branches will 583 be first sorted accord to path selection, with WiFi at the top. 584 Then, branches with SCTP will be sorted to the top within their 585 subtree according to the properties influencing protocol selection. 586 However, if the implementation has current cache information that 587 SCTP is not available on the path over WiFi, there is no SCTP node in 588 the WiFi subtree. Here, the path over WiFi will be tried first, and, 589 if connection establishment succeeds, TCP will be used. So the 590 Selection Property of preferring WiFi takes precedence over the 591 Property that led to a preference for SCTP. 593 1. [www.example.com:80, Any, Any Stream] 594 1.1 [192.0.2.1:80, Wi-Fi, Any Stream] 595 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 596 1.2 [192.0.3.1:80, LTE, Any Stream] 597 1.2.1 [192.0.3.1:80, LTE, SCTP] 598 1.2.2 [192.0.3.1:80, LTE, TCP] 600 4.1.3. Sorting Branches 602 Implementations should sort the branches of the tree of connection 603 options in order of their preference rank, from most preferred to 604 least preferred. Leaf nodes on branches with higher rankings 605 represent connection attempts that will be raced first. 606 Implementations should order the branches to reflect the preferences 607 expressed by the application for its new connection, including 608 Selection Properties, which are specified in 609 [I-D.ietf-taps-interface]. 611 In addition to the properties provided by the application, an 612 implementation may include additional criteria such as cached 613 performance estimates, see Section 9.2, or system policy, see 614 Section 3.2, in the ranking. Two examples of how Selection and 615 Connection Properties may be used to sort branches are provided 616 below: 618 * "Interface Instance or Type": If the application specifies an 619 interface type to be preferred or avoided, implementations should 620 accordingly rank the paths. If the application specifies an 621 interface type to be required or prohibited, an implementation is 622 expeceted to not include the non-conforming paths. 624 * "Capacity Profile": An implementation can use the Capacity Profile 625 to prefer paths that match an application's expected traffic 626 pattern. This match will use cached performance estimates, see 627 Section 9.2: 629 - Scavenger: Prefer paths with the highest expected available 630 capacity, but minimising impact on other traffic, based on the 631 observed maximum throughput; 633 - Low Latency/Interactive: Prefer paths with the lowest expected 634 Round Trip Time, based on observed round trip time estimates; 636 - Low Latency/Non-Interactive: Prefer paths with a low expected 637 Round Trip Time, but can tolerate delay variation; 639 - Constant-Rate Streaming: Prefer paths that are expected to 640 satisy the requested Stream Send or Stream Receive Bitrate, 641 based on the observed maximum throughput; 643 - Capacity-Seeking: Prefer adapting to paths to determine the 644 highest available capacity, based on the observed maximum 645 throughput. 647 Implementations process the Properties in the following order: 648 Prohibit, Require, Prefer, Avoid. If Selection Properties contain 649 any prohibited properties, the implementation should first purge 650 branches containing nodes with these properties. For required 651 properties, it should only keep branches that satisfy these 652 requirements. Finally, it should order the branches according to the 653 preferred properties, and finally use any avoided properties as a 654 tiebreaker. When ordering branches, an implementation can give more 655 weight to properties that the application has explicitly set, than to 656 the properties that are default. 658 The available protocols and paths on a specific system and in a 659 specific context can change; therefore, the result of sorting and the 660 outcome of racing may vary, even when using the same Selection and 661 Connection Properties. However, an implementation ought to provide a 662 consistent outcome to applications, e.g., by preferring protocols and 663 paths that are already used by existing Connections that specified 664 similar Properties. 666 4.2. Candidate Gathering 668 The step of gathering candidates involves identifying which paths, 669 protocols, and endpoints may be used for a given Connection. This 670 list is determined by the requirements, prohibitions, and preferences 671 of the application as specified in the Selection Properties. 673 4.2.1. Gathering Endpoint Candidates 675 Both Local and Remote Endpoint Candidates must be discovered during 676 connection establishment. To support Interactive Connectivity 677 Establishment (ICE) [RFC8445], or similar protocols that involve out- 678 of-band indirect signalling to exchange candidates with the Remote 679 Endpoint, it is important to query the set of candidate Local 680 Endpoints, and provide the protocol stack with a set of candidate 681 Remote Endpoints, before the Local Endpoint attempts to establish 682 connections. 684 4.2.1.1. Local Endpoint candidates 686 The set of possible Local Endpoints is gathered. In the simple case, 687 this merely enumerates the local interfaces and protocols, and 688 allocates ephemeral source ports. For example, a system that has 689 WiFi and Ethernet and supports IPv4 and IPv6 might gather four 690 candidate Local Endpoints (IPv4 on Ethernet, IPv6 on Ethernet, IPv4 691 on WiFi, and IPv6 on WiFi) that can form the source for a transient. 693 If NAT traversal is required, the process of gathering Local 694 Endpoints becomes broadly equivalent to the ICE candidate gathering 695 phase (see Section 5.1.1. of [RFC8445]). The endpoint determines its 696 server reflexive Local Endpoints (i.e., the translated address of a 697 Local Endpoint, on the other side of a NAT, e.g via a STUN sever 698 [RFC5389]) and relayed Local Endpoints (e.g., via a TURN server 699 [RFC5766] or other relay), for each interface and network protocol. 700 These are added to the set of candidate Local Endpoints for this 701 connection. 703 Gathering Local Endpoints is primarily a local operation, although it 704 might involve exchanges with a STUN server to derive server reflexive 705 Local Endpoints, or with a TURN server or other relay to derive 706 relayed Local Endpoints. However, it does not involve communication 707 with the Remote Endpoint. 709 4.2.1.2. Remote Endpoint Candidates 711 The Remote Endpoint is typically a name that needs to be resolved 712 into a set of possible addresses that can be used for communication. 713 Resolving the Remote Endpoint is the process of recursively 714 performing such name lookups, until fully resolved, to return the set 715 of candidates for the Remote Endpoint of this connection. 717 How this resolution is done will depend on the type of the Remote 718 Endpoint, and can also be specific to each Local Endpoint. A common 719 case is when the Remote Endpoint is a DNS name, in which case it is 720 resolved to give a set of IPv4 and IPv6 addresses representing that 721 name. Some types of Remote Endpoint might require more complex 722 resolution. Resolving the Remote Endpoint for a peer-to-peer 723 connection might involve communication with a rendezvous server, 724 which in turn contacts the peer to gain consent to communicate and 725 retrieve its set of candidate Local Endpoints, which are returned and 726 form the candidate remote addresses for contacting that peer. 728 Resolving the Remote Endpoint is not a local operation. It will 729 involve a directory service, and can require communication with the 730 Remote Endpoint to rendezvous and exchange peer addresses. This can 731 expose some or all of the candidate Local Endpoints to the Remote 732 Endpoint. 734 4.3. Candidate Racing 736 The primary goal of the Candidate Racing process is to successfully 737 negotiate a protocol stack to an endpoint over an interface to 738 connect a single leaf node of the tree with as little delay and as 739 few unnecessary connections attempts as possible. Optimizing these 740 two factors improves the user experience, while minimizing network 741 load. 743 This section covers the dynamic aspect of connection establishment. 744 The tree described above is a useful conceptual and architectural 745 model. However, an implementation is unable to know the full tree 746 before it is formed and many of the possible branches ultimately 747 might not be used. 749 There are three different approaches to racing the attempts for 750 different nodes of the connection establishment tree: 752 1. Simultaneous 754 2. Staggered 756 3. Failover 758 Each approach is appropriate in different use-cases and branch types. 759 However, to avoid consuming unnecessary network resources, 760 implementations should not use simultaneous racing as a default 761 approach. 763 The timing algorithms for racing should remain independent across 764 branches of the tree. Any timers or racing logic is isolated to a 765 given parent node, and is not ordered precisely with regards to other 766 children of other nodes. 768 4.3.1. Simultaneous 770 Simultaneous racing is when multiple alternate branches are started 771 without waiting for any one branch to make progress before starting 772 the next alternative. This means the attempts are effectively 773 simultaneous. Simultaneous racing should be avoided by 774 implementations, since it consumes extra network resources and 775 establishes state that might not be used. 777 4.3.2. Staggered 779 Staggered racing can be used whenever a single node of the tree has 780 multiple child nodes. Based on the order determined when building 781 the tree, the first child node will be initiated immediately, 782 followed by the next child node after some delay. Once that second 783 child node is initiated, the third child node (if present) will begin 784 after another delay, and so on until all child nodes have been 785 initiated, or one of the child nodes successfully completes its 786 negotiation. 788 Staggered racing attempts can proceed in parallel. Implementations 789 should not terminate an earlier child connection attempt upon 790 starting a secondary child. 792 If a child node fails to establish connectivity (as in Section 4.4.1) 793 before the delay time has expired for the next child, the next child 794 should be started immediately. 796 Staggered racing between IP addresses for a generic Connection should 797 follow the Happy Eyeballs algorithm described in [RFC8305]. 798 [RFC8421] provides guidance for racing when performing Interactive 799 Connectivity Establishment (ICE). 801 Generally, the delay before starting a given child node ought to be 802 based on the length of time the previously started child node is 803 expected to take before it succeeds or makes progress in connection 804 establishment. Algorithms like Happy Eyeballs choose a delay based 805 on how long the transport connection handshake is expected to take. 806 When performing staggered races in multiple branch types (such as 807 racing between network interfaces, and then racing between IP 808 addresses), a longer delay may be chosen for some branch types. For 809 example, when racing between network interfaces, the delay should 810 also take into account the amount of time it takes to prepare the 811 network interface (such as radio association) and name resolution 812 over that interface, in addition to the delay that would be added for 813 a single transport connection handshake. 815 Since the staggered delay can be chosen based on dynamic information, 816 such as predicted round-trip time, implementations should define 817 upper and lower bounds for delay times. These bounds are 818 implementation-specific, and may differ based on which branch type is 819 being used. 821 4.3.3. Failover 823 If an implementation or application has a strong preference for one 824 branch over another, the branching node may choose to wait until one 825 child has failed before starting the next. Failure of a leaf node is 826 determined by its protocol negotiation failing or timing out; failure 827 of a parent branching node is determined by all of its children 828 failing. 830 An example in which failover is recommended is a race between a 831 protocol stack that uses a proxy and a protocol stack that bypasses 832 the proxy. Failover is useful in case the proxy is down or 833 misconfigured, but any more aggressive type of racing may end up 834 unnecessarily avoiding a proxy that was preferred by policy. 836 4.4. Completing Establishment 838 The process of connection establishment completes when one leaf node 839 of the tree has successfully completed negotiation with the Remote 840 Endpoint, or else all nodes of the tree have failed to connect. The 841 first leaf node to complete its connection is then used by the 842 application to send and receive data. 844 Successes and failures of a given attempt should be reported up to 845 parent nodes (towards the trunk of the tree). For example, in the 846 following case, if 1.1.1 fails to connect, it reports the failure to 847 1.1. Since 1.1 has no other child nodes, it also has failed and 848 reports that failure to 1. Because 1.2 has not yet failed, 1 is not 849 considered to have failed. Since 1.2 has not yet started, it is 850 started and the process continues. Similarly, if 1.1.1 successfully 851 connects, then it marks 1.1 as connected, which propagates to the 852 trunk node 1. At this point, the connection as a whole is considered 853 to be successfully connected and ready to process application data. 855 1 [www.example.com:80, Any, TCP] 856 1.1 [www.example.com:80, Wi-Fi, TCP] 857 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 858 1.2 [www.example.com:80, LTE, TCP] 859 ... 861 If a leaf node has successfully completed its connection, all other 862 attempts should be made ineligible for use by the application for the 863 original request. New connection attempts that involve transmitting 864 data on the network ought not to be started after another leaf node 865 has already successfully completed, because the connection as a whole 866 has now been established. An implementation may choose to let 867 certain handshakes and negotiations complete in order to gather 868 metrics to influence future connections. Keeping additional 869 connections is generally not recommended since those attempts were 870 slower to connect and may exhibit less desirable properties. 872 4.4.1. Determining Successful Establishment 874 Implementations may select the criteria by which a leaf node is 875 considered to be successfully connected differently on a per-protocol 876 basis. If the only protocol being used is a transport protocol with 877 a clear handshake, like TCP, then the obvious choice is to declare 878 that node "connected" when the last packet of the three-way handshake 879 has been received. If the only protocol being used is an 880 connectionless protocol, like UDP, the implementation may consider 881 the node fully "connected" the moment it determines a route is 882 present, before sending any packets on the network, see further 883 Section 4.6. 885 For protocol stacks with multiple handshakes, the decision becomes 886 more nuanced. If the protocol stack involves both TLS and TCP, an 887 implementation could determine that a leaf node is connected after 888 the TCP handshake is complete, or it can wait for the TLS handshake 889 to complete as well. The benefit of declaring completion when the 890 TCP handshake finishes, and thus stopping the race for other branches 891 of the tree, is reduced burden on the network and Remote Endpoints 892 from further connection attempts that are likely to be abandoned. On 893 the other hand, by waiting until the TLS handshake is complete, an 894 implementation avoids the scenario in which a TCP handshake completes 895 quickly, but TLS negotiation is either very slow or fails altogether 896 in particular network conditions or to a particular endpoint. To 897 avoid the issue of TLS possibly failing, the implementation should 898 not generate a Ready event for the Connection until TLS is 899 established. 901 If all of the leaf nodes fail to connect during racing, i.e. none of 902 the configurations that satisfy all requirements given in the 903 Transport Properties actually work over the available paths, then the 904 transport system should notify the application with an InitiateError 905 event. An InitiateError event should also be generated in case the 906 transport system finds no usable candidates to race. 908 4.5. Establishing multiplexed connections 910 Multiplexing several Connections over a single underlying transport 911 connection requires that the Connections to be multiplexed belong to 912 the same Connection Group (as is indicated by the application using 913 the Clone call). When the underlying transport connection supports 914 multi-streaming, the Transport Services System can map each 915 Connection in the Connection Group to a different stream. Thus, when 916 the Connections that are offered to an application by the Transport 917 Services API are multiplexed, the Transport Services implementation 918 can establish a new Connection by simply beginning to use a new 919 stream of an already established transport Connection and there is no 920 need for a connection establishment procedure. This, then, also 921 means that there may not be any "establishment" message (like a TCP 922 SYN), but the application can simply start sending or receiving. 923 Therefore, when the Initiate action of a Transport Services API is 924 called without Messages being handed over, it cannot be guaranteed 925 that the Remote Endpoint will have any way to know about this, and 926 hence a passive endpoint's ConnectionReceived event might not be 927 called until data is received. Instead, calling the 928 ConnectionReceived event could be delayed until the first Message 929 arrives. 931 4.6. Handling connectionless protocols 933 While protocols that use an explicit handshake to validate a 934 Connection to a peer can be used for racing multiple establishment 935 attempts in parallel, connectionless protocols such as raw UDP do not 936 offer a way to validate the presence of a peer or the usability of a 937 Connection without application feedback. An implementation should 938 consider such a protocol stack to be established as soon as the 939 Transport Services system has selected a path on which to send data. 941 However, if a peer is not reachable over the network using the 942 connectionless protocol, or data cannot be exchanged for any other 943 reason, the application may want to attempt using another candidate 944 Protocol Stack. The implementation should maintain the list of other 945 candidate Protocol Stacks that were eligible to use. 947 4.7. Implementing listeners 949 When an implementation is asked to Listen, it registers with the 950 system to wait for incoming traffic to the Local Endpoint. If no 951 Local Endpoint is specified, the implementation should use an 952 ephemeral port. 954 If the Selection Properties do not require a single network interface 955 or path, but allow the use of multiple paths, the Listener object 956 should register for incoming traffic on all of the network interfaces 957 or paths that conform to the Properties. The set of available paths 958 can change over time, so the implementation should monitor network 959 path changes, and change the registration of the Listener across all 960 usable paths as appropriate. When using multiple paths, the Listener 961 is generally expected to use the same port for listening on each. 963 If the Selection Properties allow multiple protocols to be used for 964 listening, and the implementation supports it, the Listener object 965 should support receiving inbound connections for each eligible 966 protocol on each eligible path. 968 4.7.1. Implementing listeners for Connected Protocols 970 Connected protocols such as TCP and TLS-over-TCP have a strong 971 mapping between the Local and Remote Endpoints (four-tuple) and their 972 protocol connection state. These map into Connection objects. 973 Whenever a new inbound handshake is being started, the Listener 974 should generate a new Connection object and pass it to the 975 application. 977 4.7.2. Implementing listeners for Connectionless Protocols 979 Connectionless protocols such as UDP and UDP-lite generally do not 980 provide the same mechanisms that connected protocols do to offer 981 Connection objects. Implementations should wait for incoming packets 982 for connectionless protocols on a listening port and should perform 983 four-tuple matching of packets to either existing Connection objects 984 or the creation of new Connection objects. On platforms with 985 facilities to create a "virtual connection" for connectionless 986 protocols implementations should use these mechanisms to minimise the 987 handling of datagrams intended for already created Connection 988 objects. 990 4.7.3. Implementing listeners for Multiplexed Protocols 992 Protocols that provide multiplexing of streams into a single four- 993 tuple can listen both for entirely new connections (a new HTTP/2 994 stream on a new TCP connection, for example) and for new sub- 995 connections (a new HTTP/2 stream on an existing connection). If the 996 abstraction of Connection presented to the application is mapped to 997 the multiplexed stream, then the Listener should deliver new 998 Connection objects in the same way for either case. The 999 implementation should allow the application to introspect the 1000 Connection Group marked on the Connections to determine the grouping 1001 of the multiplexing. 1003 5. Implementing Sending and Receiving Data 1005 The most basic mapping for sending a Message is an abstraction of 1006 datagrams, in which the transport protocol naturally deals in 1007 discrete packets. Each Message here corresponds to a single 1008 datagram. Generally, these will be short enough that sending and 1009 receiving will always use a complete Message. 1011 For protocols that expose byte-streams, the only delineation provided 1012 by the protocol is the end of the stream in a given direction. Each 1013 Message in this case corresponds to the entire stream of bytes in a 1014 direction. These Messages may be quite long, in which case they can 1015 be sent in multiple parts. 1017 Protocols that provide the framing (such as length-value protocols, 1018 or protocols that use delimiters) may support Message sizes that do 1019 not fit within a single datagram. Each Message for framing protocols 1020 corresponds to a single frame, which may be sent either as a complete 1021 Message in the underlying protocol, or in multiple parts. 1023 5.1. Sending Messages 1025 The effect of the application sending a Message is determined by the 1026 top-level protocol in the established Protocol Stack. That is, if 1027 the top-level protocol provides an abstraction of framed messages 1028 over a connection, the receiving application will be able to obtain 1029 multiple Messages on that connection, even if the framing protocol is 1030 built on a byte-stream protocol like TCP. 1032 5.1.1. Message Properties 1034 * Lifetime: this should be implemented by removing the Message from 1035 the queue of pending Messages after the Lifetime has expired. A 1036 queue of pending Messages within the transport system 1037 implementation that have yet to be handed to the Protocol Stack 1038 can always support this property, but once a Message has been sent 1039 into the send buffer of a protocol, only certain protocols may 1040 support removing a message. For example, an implementation cannot 1041 remove bytes from a TCP send buffer, while it can remove data from 1042 a SCTP send buffer using the partial reliability extension 1043 [RFC8303]. When there is no standing queue of Messages within the 1044 system, and the Protocol Stack does not support the removal of a 1045 Message from the stack's send buffer, this property may be 1046 ignored. 1048 * Priority: this represents the ability to prioritize a Message over 1049 other Messages. This can be implemented by the system re-ordering 1050 Messages that have yet to be handed to the Protocol Stack, or by 1051 giving relative priority hints to protocols that support 1052 priorities per Message. For example, an implementation of HTTP/2 1053 could choose to send Messages of different Priority on streams of 1054 different priority. 1056 * Ordered: when this is false, this disables the requirement of in- 1057 order-delivery for protocols that support configurable ordering. 1058 When the protocol stack does not support configurable ordering, 1059 this property may be ignored. 1061 * Safely Replayable: when this is true, this means that the Message 1062 can be used by a transport mechanism that might transfer it 1063 multiple times -- e.g., as a result of racing multiple transports 1064 or as part of TCP Fast Open. Also, protocols that do not protect 1065 against duplicated messages, such as UDP (when used directly, 1066 without a protocol layered atop), can only be used with Messages 1067 that are Safely Replayable. When a transport system is permitted 1068 to replay messages, replay protection could be provided by the 1069 application. 1071 * Final: when this is true, this means that the sender will not send 1072 any further messages. The Connection need not be closed (in case 1073 the Protocol Stack supports half-close operation, like TCP). Any 1074 messages sent after a Final message will result in a SendError. 1076 * Corruption Protection Length: when this is set to any value other 1077 than Full Coverage, it sets the minimum protection in protocols 1078 that allow limiting the checksum length (e.g. UDP-Lite). If the 1079 protocol stack does not support checksum length limitation, this 1080 property may be ignored. 1082 * Reliable Data Transfer (Message): When true, the property 1083 specifies that the Message must be reliably transmitted. When 1084 false, and if unreliable transmission is supported by the 1085 underlying protocol, then the Message should be unreliably 1086 transmitted. If the underlying protocol does not support 1087 unreliable transmission, the Message should be reliably 1088 transmitted. 1090 * Message Capacity Profile Override: When true, this expresses a 1091 wish to override the Generic Connection Property Capacity Profile 1092 for this Message. Depending on the value, this can, for example, 1093 be implemented by changing the DSCP value of the associated packet 1094 (note that the guidelines in Section 6 of [RFC7657] apply; e.g., 1095 the DSCP value should not be changed for different packets within 1096 a reliable transport protocol session or DCCP connection). 1098 * No Fragmentation: When set, this property limits the message size 1099 to the Maximum Message Size Before Fragmentation or Segmentation 1100 (see Section 10.1.7 of [I-D.ietf-taps-interface]). Messages 1101 larger than this size generate an error. Setting this avoids 1102 transport-layer segmentation or network-layer fragmentation. When 1103 used with transports running over IP version 4 the Don't Fragment 1104 bit will be set to avoid on-path IP fragmentation ([RFC8304]). 1106 5.1.2. Send Completion 1108 The application should be notified whenever a Message or partial 1109 Message has been consumed by the Protocol Stack, or has failed to 1110 send. The time at which a Message is considered to have been 1111 consumed by the Protocol Stack may vary depending on the protocol. 1112 For example, for a basic datagram protocol like UDP, this may 1113 correspond to the time when the packet is sent into the interface 1114 driver. For a protocol that buffers data in queues, like TCP, this 1115 may correspond to when the data has entered the send buffer. The 1116 time at which a message failed to send is when Transport Services 1117 implementation (including the Protocol Stack) has not successfully 1118 sent the entire Message content or partial Message content on any 1119 open candidate connection; this can depend on protocol-specific 1120 timeouts. 1122 5.1.3. Batching Sends 1124 Since sending a Message may involve a context switch between the 1125 application and the Transport Services system, sending patterns that 1126 involve multiple small Messages can incur high overhead if each needs 1127 to be enqueued separately. To avoid this, the application can 1128 indicate a batch of Send actions through the API. When this is used, 1129 the implementation can defer the processing of Messages until the 1130 batch is complete. 1132 5.2. Receiving Messages 1134 Similar to sending, Receiving a Message is determined by the top- 1135 level protocol in the established Protocol Stack. The main 1136 difference with Receiving is that the size and boundaries of the 1137 Message are not known beforehand. The application can communicate in 1138 its Receive action the parameters for the Message, which can help the 1139 Transport Services implementation know how much data to deliver and 1140 when. For example, if the application only wants to receive a 1141 complete Message, the implementation should wait until an entire 1142 Message (datagram, stream, or frame) is read before delivering any 1143 Message content to the application. This requires the implementation 1144 to understand where messages end, either via a supplied deframer or 1145 because the top-level protocol in the established Protocol Stack 1146 preserves message boundaries. If the top-level protocol only 1147 supports a byte-stream and no framers were supported, the application 1148 can control the flow of received data by specifying the minimum 1149 number of bytes of Message content it wants to receive at one time. 1151 If a Connection finishes before a requested Receive action can be 1152 satisfied, the Transport Services API should deliver any partial 1153 Message content outstanding, or if none is available, an indication 1154 that there will be no more received Messages. 1156 5.3. Handling of data for fast-open protocols 1158 Several protocols allow sending higher-level protocol or application 1159 data during their protocol establishment, such as TCP Fast Open 1160 [RFC7413] and TLS 1.3 [RFC8446]. This approach is referred to as 1161 sending Zero-RTT (0-RTT) data. This is a desirable feature, but 1162 poses challenges to an implementation that uses racing during 1163 connection establishment. 1165 The amount of data that can be sent as 0-RTT data varies by protocol 1166 and can be queried by the application using the Maximum Message Size 1167 Concurrent with Connection Establishment Connection Property. An 1168 implementation can set this property according to the protocols that 1169 it will race based on the given Selection Properties when the 1170 application requests to establish a connection. 1172 If the application has 0-RTT data to send in any protocol handshakes, 1173 it needs to provide this data before the handshakes have begun. When 1174 racing, this means that the data should be provided before the 1175 process of connection establishment has begun. If the application 1176 wants to send 0-RTT data, it must indicate this to the implementation 1177 by setting the Safely Replayable send parameter to true when sending 1178 the data. In general, 0-RTT data may be replayed (for example, if a 1179 TCP SYN contains data, and the SYN is retransmitted, the data will be 1180 retransmitted as well but may be considered as a new connection 1181 instead of a retransmission). Also, when racing connections, 1182 different leaf nodes have the opportunity to send the same data 1183 independently. If data is truly safely replayable, this should be 1184 permissible. 1186 Once the application has provided its 0-RTT data, a Transport 1187 Services implementation should keep a copy of this data and provide 1188 it to each new leaf node that is started and for which a 0-RTT 1189 protocol is being used. 1191 It is also possible that protocol stacks within a particular leaf 1192 node use 0-RTT handshakes without any safely replayable application 1193 data. For example, TCP Fast Open could use a Client Hello from TLS 1194 as its 0-RTT data, shortening the cumulative handshake time. 1196 0-RTT handshakes often rely on previous state, such as TCP Fast Open 1197 cookies, previously established TLS tickets, or out-of-band 1198 distributed pre-shared keys (PSKs). Implementations should be aware 1199 of security concerns around using these tokens across multiple 1200 addresses or paths when racing. In the case of TLS, any given ticket 1201 or PSK should only be used on one leaf node, since servers will 1202 likely reject duplicate tickets in order to prevent replays (see 1203 section-8.1 [RFC8446]). If implementations have multiple tickets 1204 available from a previous connection, each leaf node attempt can use 1205 a different ticket. In effect, each leaf node will send the same 1206 early application data, yet encoded (encrypted) differently on the 1207 wire. 1209 6. Implementing Message Framers 1211 Message Framers are functions that define simple transformations 1212 between application Message data and raw transport protocol data. A 1213 Framer can encapsulate or encode outbound Messages, and decapsulate 1214 or decode inbound data into Messages. 1216 While many protocols can be represented as Message Framers, for the 1217 purposes of the Transport Services API, these are ways for 1218 applications or application frameworks to define their own Message 1219 parsing to be included within a Connection's Protocol Stack. As an 1220 example, TLS is exposed as a protocol natively supported by the 1221 Transport Services API, even though it could also serve the purpose 1222 of framing data over TCP. 1224 Most Message Framers fall into one of two categories: 1226 * Header-prefixed record formats, such as a basic Type-Length-Value 1227 (TLV) structure 1229 * Delimiter-separated formats, such as HTTP/1.1. 1231 Common Message Framers can be provided by a Transport Services 1232 implementation, but an implementation ought to allow custom Message 1233 Framers to be defined by the application or some other piece of 1234 software. This section describes one possible API for defining 1235 Message Framers as an example. 1237 6.1. Defining Message Framers 1239 A Message Framer is primarily defined by the code that handles events 1240 for a framer implementation, specifically how it handles inbound and 1241 outbound data parsing. The function that implements custom framing 1242 logic will be referred to as the "framer implementation", which may 1243 be provided by a Transport Services implementation or the application 1244 itself. The Message Framer refers to the object or function within 1245 the main Connection implementation that delivers events to the custom 1246 framer implementation whenever data is ready to be parsed or framed. 1248 The Transport Services implementation needs to ensure that all of the 1249 events and actions taken on a Message Framer are synchronized to 1250 ensure consistent behavior. For example, some of the actions defined 1251 below (such as PrependFramer and StartPassthrough) modify how data 1252 flows in a protocol stack, and require synchronization with sending 1253 and parsing data in the Message Framer. 1255 When a Connection establishment attempt begins, an event can be 1256 delivered to notify the framer implementation that a new Connection 1257 is being created. Similarly, a stop event can be delivered when a 1258 Connection is being torn down. The framer implementation can use the 1259 Connection object to look up specific properties of the Connection or 1260 the network being used that may influence how to frame Messages. 1262 MessageFramer -> Start(Connection) 1263 MessageFramer -> Stop(Connection) 1265 When a Message Framer generates a Start event, the framer 1266 implementation has the opportunity to start writing some data prior 1267 to the Connection delivering its Ready event. This allows the 1268 implementation to communicate control data to the Remote Endpoint 1269 that can be used to parse Messages. 1271 MessageFramer.MakeConnectionReady(Connection) 1273 Similarly, when a Message Framer generates a Stop event, the framer 1274 implementation has the opportunity to write some final data or clear 1275 up its local state before the Closed event is delivered to the 1276 Application. The framer implementation can indicate that it has 1277 finished with this. 1279 MessageFramer.MakeConnectionClosed(Connection) 1281 At any time if the implementation encounters a fatal error, it can 1282 also cause the Connection to fail and provide an error. 1284 MessageFramer.FailConnection(Connection, Error) 1285 Should the framer implementation deem the candidate selected during 1286 racing unsuitable, it can signal this to the Transport Services API 1287 by failing the Connection prior to marking it as ready. If there are 1288 no other candidates available, the Connection will fail. Otherwise, 1289 the Connection will select a different candidate and the Message 1290 Framer will generate a new Start event. 1292 Before an implementation marks a Message Framer as ready, it can also 1293 dynamically add a protocol or framer above it in the stack. This 1294 allows protocols that need to add TLS conditionally, like STARTTLS 1295 [RFC3207], to modify the Protocol Stack based on a handshake result. 1297 otherFramer := NewMessageFramer() 1298 MessageFramer.PrependFramer(Connection, otherFramer) 1300 A Message Framer might also choose to go into a passthrough mode once 1301 an initial exchange or handshake has been completed, such as the 1302 STARTTLS case mentioned above. This can also be useful for proxy 1303 protocols like SOCKS [RFC1928] or HTTP CONNECT [RFC7230]. In such 1304 cases, a Message Framer implementation can intercept sending and 1305 receiving of messages at first, but then indicate that no more 1306 processing is needed. 1308 MessageFramer.StartPassthrough() 1310 6.2. Sender-side Message Framing 1312 Message Framers generate an event whenever a Connection sends a new 1313 Message. 1315 MessageFramer -> NewSentMessage 1317 Upon receiving this event, a framer implementation is responsible for 1318 performing any necessary transformations and sending the resulting 1319 data back to the Message Framer, which will in turn send it to the 1320 next protocol. Implementations SHOULD ensure that there is a way to 1321 pass the original data through without copying to improve 1322 performance. 1324 MessageFramer.Send(Connection, Data) 1326 To provide an example, a simple protocol that adds a length as a 1327 header would receive the NewSentMessage event, create a data 1328 representation of the length of the Message data, and then send a 1329 block of data that is the concatenation of the length header and the 1330 original Message data. 1332 6.3. Receiver-side Message Framing 1334 In order to parse a received flow of data into Messages, the Message 1335 Framer notifies the framer implementation whenever new data is 1336 available to parse. 1338 MessageFramer -> HandleReceivedData 1340 Upon receiving this event, the framer implementation can inspect the 1341 inbound data. The data is parsed from a particular cursor 1342 representing the unprocessed data. The application requests a 1343 specific amount of data it needs to have available in order to parse. 1344 If the data is not available, the parse fails. 1346 MessageFramer.Parse(Connection, MinimumIncompleteLength, MaximumLength) -> (Data, MessageContext, IsEndOfMessage) 1348 The framer implementation can directly advance the receive cursor 1349 once it has parsed data to effectively discard data (for example, 1350 discard a header once the content has been parsed). 1352 To deliver a Message to the application, the framer implementation 1353 can either directly deliver data that it has allocated, or deliver a 1354 range of data directly from the underlying transport and 1355 simultaneously advance the receive cursor. 1357 MessageFramer.AdvanceReceiveCursor(Connection, Length) 1358 MessageFramer.DeliverAndAdvanceReceiveCursor(Connection, MessageContext, Length, IsEndOfMessage) 1359 MessageFramer.Deliver(Connection, MessageContext, Data, IsEndOfMessage) 1361 Note that MessageFramer.DeliverAndAdvanceReceiveCursor allows the 1362 framer implementation to earmark bytes as part of a Message even 1363 before they are received by the transport. This allows the delivery 1364 of very large Messages without requiring the implementation to 1365 directly inspect all of the bytes. 1367 To provide an example, a simple protocol that parses a length as a 1368 header value would receive the HandleReceivedData event, and call 1369 Parse with a minimum and maximum set to the length of the header 1370 field. Once the parse succeeded, it would call AdvanceReceiveCursor 1371 with the length of the header field, and then call 1372 DeliverAndAdvanceReceiveCursor with the length of the body that was 1373 parsed from the header, marking the new Message as complete. 1375 7. Implementing Connection Management 1377 Once a Connection is established, the Transport Services API allows 1378 applications to interact with the Connection by modifying or 1379 inspecting Connection Properties. A Connection can also generate 1380 events in the form of Soft Errors. 1382 The set of Connection Properties that are supported for setting and 1383 getting on a Connection are described in [I-D.ietf-taps-interface]. 1384 For any properties that are generic, and thus could apply to all 1385 protocols being used by a Connection, the Transport Services 1386 implementation should store the properties in storage common to all 1387 protocols, and notify all protocol instances in the Protocol Stack 1388 whenever the properties have been modified by the application. For 1389 protocol-specfic properties, such as the User Timeout that applies to 1390 TCP, the Transport Services implementation only needs to update the 1391 relevant protocol instance. 1393 If an error is encountered in setting a property (for example, if the 1394 application tries to set a TCP-specific property on a Connection that 1395 is not using TCP), the action should fail gracefully. The 1396 application may be informed of the error, but the Connection itself 1397 should not be terminated. 1399 The Transport Services API should allow protocol instances in the 1400 Protocol Stack to pass up arbitrary generic or protocol-specific 1401 errors that can be delivered to the application as Soft Errors. 1402 These allow the application to be informed of ICMP errors, and other 1403 similar events. 1405 7.1. Pooled Connection 1407 For applications that do not need in-order delivery of Messages, the 1408 Transport Services implementation may distribute Messages of a single 1409 Connection across several underlying transport connections or 1410 multiple streams of multi-streaming connections between endpoints, as 1411 long as all of these satisfy the Selection Properties. The Transport 1412 Services implementation will then hide this connection management and 1413 only expose a single Connection object, which we here call a "Pooled 1414 Connection". This is in contrast to Connection Groups, which 1415 explicitly expose combined treatment of Connections, giving the 1416 application control over multiplexing, for example. 1418 Pooled Connections can be useful when the application using the 1419 Transport Services system implements a protocol such as HTTP, which 1420 employs request/response pairs and does not require in-order delivery 1421 of responses. This enables implementations of Transport Services 1422 systems to realize transparent connection coalescing, connection 1423 migration, and to perform per-message endpoint and path selection by 1424 choosing among multiple underlying connections. 1426 7.2. Handling Path Changes 1428 When a path change occurs, e.g., when the IP address of an interface 1429 changes or a new interface becomes available, the Transport Services 1430 implementation is responsible for notifying the Protocol Instance of 1431 the change. The path change may interrupt connectivity on a path for 1432 an active connection or provide an opportunity for a transport that 1433 supports multipath or migration to adapt to the new paths. Note 1434 that, in the model of the Transport Services API, migration is 1435 considered a part of multipath connectivity; it is just a limiting 1436 policy on multipath usage. If the multipath Selection Property is 1437 set to Disabled, migration is disallowed. 1439 For protocols that do not support multipath or migration, the 1440 Protocol Instances should be informed of the path change, but should 1441 not be forcibly disconnected if the previously used path becomes 1442 unavailable. There are many common user scenarios that can lead to a 1443 path becoming temporarily unavailable, and then recovering before the 1444 transport protocol reaches a timeout error. These are particularly 1445 common using mobile devices. Examples include: an Ethernet cable 1446 becoming unplugged and then plugged back in; a device losing a Wi-Fi 1447 signal while a user is in an elevator, and reattaching when the user 1448 leaves the elevator; and a user losing the radio signal while riding 1449 a train through a tunnel. If the device is able to rejoin a network 1450 with the same IP address, a stateful transport connection can 1451 generally resume. Thus, while it is useful for a Protocol Instance 1452 to be aware of a temporary loss of connectivity, the Transport 1453 Services implementation should not aggressively close connections in 1454 these scenarios. 1456 If the Protocol Stack includes a transport protocol that supports 1457 multipath connectivity, the Transport Services implementation should 1458 also inform the Protocol Instance of potentially new paths that 1459 become permissible based on the multipath Selection Property and the 1460 multipath-policy Connection Property choices made by the application. 1461 A protocol can then establish new subflows over new paths while an 1462 active path is still available or, if migration is supported, also 1463 after a break has been detected, and should attempt to tear down 1464 subflows over paths that are no longer used. The Connection Property 1465 multipath-policy of the Transport Services API allows an application 1466 to indicate when and how different paths should be used. However, 1467 detailed handling of these policies is still implementation-specific. 1468 For example, if the multipath Selection Property is set to active, 1469 the decision about when to create a new path or to announce a new 1470 path or set of paths to the Remote Endpoint, e.g., in the form of 1471 additional IP addresses, is implementation-specific. If the Protocol 1472 Stack includes a transport protocol that does not support multipath, 1473 but does support migrating between paths, the update to the set of 1474 available paths can trigger the connection to be migrated. 1476 In case of Pooled Connections Section 7.1, the Transport Services 1477 implementation may add connections over new paths to the pool if 1478 permissible based on the multipath policy and Selection Properties. 1479 In case a previously used path becomes unavailable, the transport 1480 system may disconnect all connections that require this path, but 1481 should not disconnect the pooled connection object exposed to the 1482 application. The strategy to do so is implementation-specific, but 1483 should be consistent with the behavior of multipath transports. 1485 8. Implementing Connection Termination 1487 With TCP, when an application closes a connection, this means that it 1488 has no more data to send (but expects all data that has been handed 1489 over to be reliably delivered). However, with TCP only, "close" does 1490 not mean that the application will stop receiving data. This is 1491 related to TCP's ability to support half-closed connections. 1493 SCTP is an example of a protocol that does not support such half- 1494 closed connections. Hence, with SCTP, the meaning of "close" is 1495 stricter: an application has no more data to send (but expects all 1496 data that has been handed over to be reliably delivered), and will 1497 also not receive any more data. 1499 Implementing a protocol independent transport system means that the 1500 exposed semantics must be the strictest subset of the semantics of 1501 all supported protocols. Hence, as is common with all reliable 1502 transport protocols, after a Close action, the application can expect 1503 to have its reliability requirements honored regarding the data 1504 provided to the Transport Services API, but it cannot expect to be 1505 able to read any more data after calling Close. 1507 Abort differs from Close only in that no guarantees are given 1508 regarding any data that the application sent to the Transport 1509 Services API before calling Abort. 1511 As explained in Section 4.5, when a new stream is multiplexed on an 1512 already existing connection of a Transport Protocol Instance, there 1513 is no need for a connection establishment procedure. Because the 1514 Connections that are offered by a Transport Services implementation 1515 can be implemented as streams that are multiplexed on a transport 1516 protocol's connection, it can therefore not be guaranteed an Initiate 1517 action from one endpoint provokes a ConnectionReceived event at its 1518 peer. 1520 For Close (provoking a Finished event) and Abort (provoking a 1521 ConnectionError event), the same logic applies: while it is desirable 1522 to be informed when a peer closes or aborts a Connection, whether 1523 this is possible depends on the underlying protocol, and no 1524 guarantees can be given. With SCTP, the transport system can use the 1525 stream reset procedure to cause a Finish event upon a Close action 1526 from the peer [NEAT-flow-mapping]. 1528 9. Cached State 1530 Beyond a single Connection's lifetime, it is useful for an 1531 implementation to keep state and history. This cached state can help 1532 improve future Connection establishment due to re-using results and 1533 credentials, and favoring paths and protocols that performed well in 1534 the past. 1536 Cached state may be associated with different endpoints for the same 1537 Connection, depending on the protocol generating the cached content. 1538 For example, session tickets for TLS are associated with specific 1539 endpoints, and thus should be cached based on a Connection's hostname 1540 endpoint (if applicable). However, performance characteristics of a 1541 path are more likely tied to the IP address and subnet being used. 1543 9.1. Protocol state caches 1545 Some protocols will have long-term state to be cached in association 1546 with endpoints. This state often has some time after which it is 1547 expired, so the implementation should allow each protocol to specify 1548 an expiration for cached content. 1550 Examples of cached protocol state include: 1552 * The DNS protocol can cache resolution answers (A and AAAA queries, 1553 for example), associated with a Time To Live (TTL) to be used for 1554 future hostname resolutions without requiring asking the DNS 1555 resolver again. 1557 * TLS caches session state and tickets based on a hostname, which 1558 can be used for resuming sessions with a server. 1560 * TCP can cache cookies for use in TCP Fast Open. 1562 Cached protocol state is primarily used during Connection 1563 establishment for a single Protocol Stack, but may be used to 1564 influence an implementation's preference between several candidate 1565 Protocol Stacks. For example, if two IP address endpoints are 1566 otherwise equally preferred, an implementation may choose to attempt 1567 a connection to an address for which it has a TCP Fast Open cookie. 1569 Applications can use the Transport Services API to request that a 1570 Connection Group maintain a separate cache for protocol state. 1571 Connections in the group will not use cached state from connections 1572 outside the group, and connections outside the group will not use 1573 state cached from connections inside the group. This may be 1574 necessary, for example, if application-layer identifiers rotate and 1575 clients wish to avoid linkability via trackable TLS tickets or TFO 1576 cookies. 1578 9.2. Performance caches 1580 In addition to protocol state, Protocol Instances should provide data 1581 into a performance-oriented cache to help guide future protocol and 1582 path selection. Some performance information can be gathered 1583 generically across several protocols to allow predictive comparisons 1584 between protocols on given paths: 1586 * Observed Round Trip Time 1588 * Connection Establishment latency 1590 * Connection Establishment success rate 1592 These items can be cached on a per-address and per-subnet 1593 granularity, and averaged between different values. The information 1594 should be cached on a per-network basis, since it is expected that 1595 different network attachments will have different performance 1596 characteristics. Besides Protocol Instances, other system entities 1597 may also provide data into performance-oriented caches. This could 1598 for instance be signal strength information reported by radio modems 1599 like Wi-Fi and mobile broadband or information about the battery- 1600 level of the device. Furthermore, the system may cache the observed 1601 maximum throughput on a path as an estimate of the available 1602 bandwidth. 1604 An implementation should use this information, when possible, to 1605 influence preference between candidate paths, endpoints, and protocol 1606 options. Eligible options that historically had significantly better 1607 performance than others should be selected first when gathering 1608 candidates (see Section 4.2) to ensure better performance for the 1609 application. 1611 The reasonable lifetime for cached performance values will vary 1612 depending on the nature of the value. Certain information, like the 1613 connection establishment success rate to a Remote Endpoint using a 1614 given protocol stack, can be stored for a long period of time (hours 1615 or longer), since it is expected that the capabilities of the Remote 1616 Endpoint are not changing very quickly. On the other hand, the Round 1617 Trip Time observed by TCP over a particular network path may vary 1618 over a relatively short time interval. For such values, the 1619 implementation should remove them from the cache more quickly, or 1620 treat older values with less confidence/weight. 1622 [I-D.ietf-tcpm-2140bis] provides guidance about sharing of TCP 1623 Control Block information between connections on initialization. 1625 10. Specific Transport Protocol Considerations 1627 Each protocol that is supported by a Transport Services 1628 implementation should have a well-defined API mapping. API mappings 1629 for a protocol are important for Connections in which a given 1630 protocol is the "top" of the Protocol Stack. For example, the 1631 mapping of the Send function for TCP applies to Connections in which 1632 the application directly sends over TCP. 1634 Each protocol has a notion of Connectedness. Possible values for 1635 Connectedness are: 1637 * Connectionless. Connectionless protocols do not establish 1638 explicit state between endpoints, and do not perform a handshake 1639 during Connection establishment. 1641 * Connected. Connected protocols establish state between endpoints, 1642 and perform a handshake during Connection establishment. The 1643 handshake may be 0-RTT to send data or resume a session, but 1644 bidirectional traffic is required to confirm connectedness. 1646 * Multiplexing Connected. Multiplexing Connected protocols share 1647 properties with Connected protocols, but also explictly support 1648 opening multiple application-level flows. This means that they 1649 can support cloning new Connection objects without a new explicit 1650 handshake. 1652 Protocols also define a notion of Data Unit. Possible values for 1653 Data Unit are: 1655 * Byte-stream. Byte-stream protocols do not define any Message 1656 boundaries of their own apart from the end of a stream in each 1657 direction. 1659 * Datagram. Datagram protocols define Message boundaries at the 1660 same level of transmission, such that only complete (not partial) 1661 Messages are supported. 1663 * Message. Message protocols support Message boundaries that can be 1664 sent and received either as complete or partial Messages. Maximum 1665 Message lengths can be defined, and Messages can be partially 1666 reliable. 1668 Below, terms in capitals with a dot (e.g., "CONNECT.SCTP") refer to 1669 the primitives with the same name in section 4 of [RFC8303]. For 1670 further implementation details, the description of these primitives 1671 in [RFC8303] points to section 3 of [RFC8303] and section 3 of 1672 [RFC8304], which refers back to the relevant specifications for each 1673 protocol. This back-tracking method applies to all elements of 1674 [RFC8923] (see appendix D of [I-D.ietf-taps-interface]): they are 1675 listed in appendix A of [RFC8923] with an implementation hint in the 1676 same style, pointing back to section 4 of [RFC8303]. 1678 This document defines the API mappings for protocols defined in 1679 [RFC8923]. Other protocol mappings can be provided as separate 1680 documents, following the mapping template Appendix A. 1682 10.1. TCP 1684 Connectedness: Connected 1686 Data Unit: Byte-stream 1688 API mappings for TCP are as follows: 1690 Connection Object: TCP connections between two hosts map directly to 1691 Connection objects. 1693 Initiate: CONNECT.TCP. Calling Initiate on a TCP Connection causes 1694 it to reserve a local port, and send a SYN to the Remote Endpoint. 1696 InitiateWithSend: CONNECT.TCP with parameter user message. Early 1697 safely replayable data is sent on a TCP Connection in the SYN, as 1698 TCP Fast Open data. 1700 Ready: A TCP Connection is ready once the three-way handshake is 1701 complete. 1703 InitiateError: Failure of CONNECT.TCP. TCP can throw various errors 1704 during connection setup. Specifically, it is important to handle 1705 a RST being sent by the peer during the handshake. 1707 ConnectionError: Once established, TCP throws errors whenever the 1708 connection is disconnected, such as due to receiving a RST from 1709 the peer. 1711 Listen: LISTEN.TCP. Calling Listen for TCP binds a local port and 1712 prepares it to receive inbound SYN packets from peers. 1714 ConnectionReceived: TCP Listeners will deliver new connections once 1715 they have replied to an inbound SYN with a SYN-ACK. 1717 Clone: Calling Clone on a TCP Connection creates a new Connection 1718 with equivalent parameters. These Connections, and Connections 1719 generated via later calls to Clone on an Establied Connection, 1720 form a Connection Group. To realize entanglement for these 1721 Connections, with the exception of Connection Priority, changing a 1722 Connection Property on one of them must affect the Connection 1723 Properties of the others too. No guarantees of honoring the 1724 Connection Property Connection Priority are given, and thus it is 1725 safe for an implementation of a transport system to ignore this 1726 property. When it is reasonable to assume that Connections 1727 traverse the same path (e.g., when they share the same 1728 encapsulation), support for it can also experimentally be 1729 implemented using a congestion control coupling mechanism (see for 1730 example [TCP-COUPLING] or [RFC3124]). 1732 Send: SEND.TCP. TCP does not on its own preserve Message 1733 boundaries. Calling Send on a TCP connection lays out the bytes 1734 on the TCP send stream without any other delineation. Any Message 1735 marked as Final will cause TCP to send a FIN once the Message has 1736 been completely written, by calling CLOSE.TCP immediately upon 1737 successful termination of SEND.TCP. Note that transmitting a 1738 Message marked as Final should not cause the Closed event to be 1739 delivered to the application, as it will still be possible to 1740 receive data until the peer closes or aborts the TCP connection. 1742 Receive: With RECEIVE.TCP, TCP delivers a stream of bytes without 1743 any Message delineation. All data delivered in the Received or 1744 ReceivedPartial event will be part of a single stream-wide Message 1745 that is marked Final (unless a Message Framer is used). 1746 EndOfMessage will be delivered when the TCP Connection has 1747 received a FIN (CLOSE-EVENT.TCP) from the peer. Note that 1748 reception of a FIN should not cause the Closed event to be 1749 delivered to the application, as it will still be possible for the 1750 application to send data. 1752 Close: Calling Close on a TCP Connection indicates that the 1753 Connection should be gracefully closed (CLOSE.TCP) by sending a 1754 FIN to the peer. It will then still be possible to receive data 1755 until the peer closes or aborts the TCP connection. The Closed 1756 event will be issued upon reception of a FIN. 1758 Abort: Calling Abort on a TCP Connection indicates that the 1759 Connection should be immediately closed by sending a RST to the 1760 peer (ABORT.TCP). 1762 10.2. MPTCP 1764 Connectedness: Connected 1766 Data Unit: Byte-stream 1768 the Transport Services API mappings for MPTCP are identical to TCP. 1769 MPTCP adds support for multipath properties, such as "Multipath 1770 Transport" and "Policy for using Multipath Transports". 1772 10.3. UDP 1774 Connectedness: Connectionless 1776 Data Unit: Datagram 1778 API mappings for UDP are as follows: 1780 Connection Object: UDP connections represent a pair of specific IP 1781 addresses and ports on two hosts. 1783 Initiate: CONNECT.UDP. Calling Initiate on a UDP Connection causes 1784 it to reserve a local port, but does not generate any traffic. 1786 InitiateWithSend: Early data on a UDP Connection does not have any 1787 special meaning. The data is sent whenever the Connection is 1788 Ready. 1790 Ready: A UDP Connection is ready once the system has reserved a 1791 local port and has a path to send to the Remote Endpoint. 1793 InitiateError: UDP Connections can only generate errors on 1794 initiation due to port conflicts on the local system. 1796 ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- 1797 Lite)) upon receiving ICMP notifications indicating failures in 1798 the network. 1800 Listen: LISTEN.UDP. Calling Listen for UDP binds a local port and 1801 prepares it to receive inbound UDP datagrams from peers. 1803 ConnectionReceived: UDP Listeners will deliver new connections once 1804 they have received traffic from a new Remote Endpoint. 1806 Clone: Calling Clone on a UDP Connection creates a new Connection 1807 with equivalent parameters. The two Connections are otherwise 1808 independent. 1810 Send: SEND.UDP(-Lite). Calling Send on a UDP connection sends the 1811 data as the payload of a complete UDP datagram. Marking Messages 1812 as Final does not change anything in the datagram's contents. 1813 Upon sending a UDP datagram, some relevant fields and flags in the 1814 IP header can be controlled: DSCP (SET_DSCP.UDP(-Lite)), DF in 1815 IPv4 (SET_DF.UDP(-Lite)) and ECN flag (SET_ECN.UDP(-Lite)). 1817 Receive: RECEIVE.UDP(-Lite). UDP only delivers complete Messages to 1818 Received, each of which represents a single datagram received in a 1819 UDP packet. Upon receiving a UDP datagram, the ECN flag from the 1820 IP header can be obtained (GET_ECN.UDP(-Lite)). 1822 Close: Calling Close on a UDP Connection (ABORT.UDP(-Lite)) releases 1823 the local port reservation. 1825 Abort: Calling Abort on a UDP Connection (ABORT.UDP(-Lite)) is 1826 identical to calling Close. 1828 10.4. UDP-Lite 1830 Connectedness: Connectionless 1832 Data Unit: Datagram 1834 The Transport Services API mappings for UDP-Lite are identical to 1835 UDP. Properties that require checksum coverage are not supported by 1836 UDP-Lite, such as "Corruption Protection Length", "Full Checksum 1837 Coverage on Sending", "Required Minimum Corruption Protection 1838 Coverage for Receiving", and "Full Checksum Coverage on Receiving". 1840 10.5. UDP Multicast Receive 1842 Connectedness: Connectionless 1844 Data Unit: Datagram 1846 API mappings for Receiving Multicast UDP are as follows: 1848 Connection Object: Established UDP Multicast Receive connections 1849 represent a pair of specific IP addresses and ports. The 1850 "unidirectional receive" transport property is required, and the 1851 Local Endpoint must be configured with a group IP address and a 1852 port. 1854 Initiate: Calling Initiate on a UDP Multicast Receive Connection 1855 causes an immediate InitiateError. This is an unsupported 1856 operation. 1858 InitiateWithSend: Calling InitiateWithSend on a UDP Multicast 1859 Receive Connection causes an immediate InitiateError. This is an 1860 unsupported operation. 1862 Ready: A UDP Multicast Receive Connection is ready once the system 1863 has received traffic for the appropriate group and port. 1865 InitiateError: UDP Multicast Receive Connections generate an 1866 InitiateError if Initiate is called. 1868 ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- 1869 Lite)) upon receiving ICMP notifications indicating failures in 1870 the network. 1872 Listen: LISTEN.UDP. Calling Listen for UDP Multicast Receive binds 1873 a local port, prepares it to receive inbound UDP datagrams from 1874 peers, and issues a multicast host join. If a Remote Endpoint 1875 with an address is supplied, the join is Source-specific 1876 Multicast, and the path selection is based on the route to the 1877 Remote Endpoint. If a Remote Endpoint is not supplied, the join 1878 is Any-source Multicast, and the path selection is based on the 1879 outbound route to the group supplied in the Local Endpoint. 1881 There are cases where it is required to open multiple connections for 1882 the same address(es). For example, one Connection might be opened 1883 for a multicast group to for a multicast control bus, and another 1884 application later opens a separate Connection to the same group to 1885 send signals to and/or receive signals from the common bus. In such 1886 cases, the Transport Services system needs to explicitly enable re- 1887 use of the same set of addresses (equivalent to setting SO_REUSEADDR 1888 in the socket API). 1890 ConnectionReceived: UDP Multicast Receive Listeners will deliver new 1891 connections once they have received traffic from a new Remote 1892 Endpoint. 1894 Clone: Calling Clone on a UDP Multicast Receive Connection creates a 1895 new Connection with equivalent parameters. The two Connections 1896 are otherwise independent. 1898 Send: SEND.UDP(-Lite). Calling Send on a UDP Multicast Receive 1899 connection causes an immediate SendError. This is an unsupported 1900 operation. 1902 Receive: RECEIVE.UDP(-Lite). The Receive operation in a UDP 1903 Multicast Receive connection only delivers complete Messages to 1904 Received, each of which represents a single datagram received in a 1905 UDP packet. Upon receiving a UDP datagram, the ECN flag from the 1906 IP header can be obtained (GET_ECN.UDP(-Lite)). 1908 Close: Calling Close on a UDP Multicast Receive Connection 1909 (ABORT.UDP(-Lite)) releases the local port reservation and leaves 1910 the group. 1912 Abort: Calling Abort on a UDP Multicast Receive Connection 1913 (ABORT.UDP(-Lite)) is identical to calling Close. 1915 10.6. SCTP 1917 Connectedness: Connected 1919 Data Unit: Message 1921 API mappings for SCTP are as follows: 1923 Connection Object: Connection objects can be mapped to an SCTP 1924 association or a stream in an SCTP association. Mapping 1925 Connection objects to SCTP streams is called "stream mapping" and 1926 has additional requirements as follows. The following explanation 1927 assumes a client-server communication model. 1929 Stream mapping requires an association to already be in place between 1930 the client and the server, and it requires the server to understand 1931 that a new incoming stream should be represented as a new Connection 1932 Object by the Transport Services system. A new SCTP stream is 1933 created by sending an SCTP message with a new stream id. Thus, to 1934 implement stream mapping, the Transport Services API MUST provide a 1935 newly created Connection Object to the application upon the reception 1936 of such a message. The necessary semantics to implement a Transport 1937 Services system Close and Abort primitives are provided by the stream 1938 reconfiguration (reset) procedure described in [RFC6525]. This also 1939 allows to re-use a stream id after resetting ("closing") the stream. 1940 To implement this functionality, SCTP stream reconfiguration 1941 [RFC6525] MUST be supported by both the client and the server side. 1943 To avoid head-of-line blocking, stream mapping SHOULD only be 1944 implemented when both sides support message interleaving [RFC8260]. 1945 This allows a sender to schedule transmissions between multiple 1946 streams without risking that transmission of a large message on one 1947 stream might block transmissions on other streams for a long time. 1949 To avoid conflicts between stream ids, the following procedure is 1950 recommended: the first Connection, for which the SCTP association has 1951 been created, MUST always use stream id zero. All additional 1952 Connections are assigned to unused stream ids in growing order. To 1953 avoid a conflict when both endpoints map new Connections 1954 simultaneously, the peer which initiated association MUST use even 1955 stream ids whereas the remote side MUST map its Connections to odd 1956 stream ids. Both sides maintain a status map of the assigned stream 1957 ids. Generally, new streams SHOULD consume the lowest available 1958 (even or odd, depending on the side) stream id; this rule is relevant 1959 when lower ids become available because Connection objects associated 1960 with the streams are closed. 1962 SCTP stream mapping as described here has been implemented in a 1963 research prototype; a desription of this implementation is given in 1964 [NEAT-flow-mapping]. 1966 Initiate: If this is the only Connection object that is assigned to 1967 the SCTP Association or stream mapping is not used, CONNECT.SCTP 1968 is called. Else, unless the Selection Property 1969 activeReadBeforeSend is Preferred or Required, a new stream is 1970 used: if there are enough streams available, Initiate is a local 1971 operation that assigns a new stream id to the Connection object. 1972 The number of streams is negotiated as a parameter of the prior 1973 CONNECT.SCTP call, and it represents a trade-off between local 1974 resource usage and the number of Connection objects that can be 1975 mapped without requiring a reconfiguration signal. When running 1976 out of streams, ADD_STREAM.SCTP must be called. 1978 InitiateWithSend: If this is the only Connection object that is 1979 assigned to the SCTP association or stream mapping is not used, 1980 CONNECT.SCTP is called with the "user message" parameter. Else, a 1981 new stream is used (see Initiate for how to handle running out of 1982 streams), and this just sends the first message on a new stream. 1984 Ready: Initiate or InitiateWithSend returns without an error, i.e. 1985 SCTP's four-way handshake has completed. If an association with 1986 the peer already exists, stream mapping is used and enough streams 1987 are available, a Connection Object instantly becomes Ready after 1988 calling Initiate or InitiateWithSend. 1990 InitiateError: Failure of CONNECT.SCTP. 1992 ConnectionError: TIMEOUT.SCTP or ABORT-EVENT.SCTP. 1994 Listen: LISTEN.SCTP. If an association with the peer already exists 1995 and stream mapping is used, Listen just expects to receive a new 1996 message with a new stream id (chosen in accordance with the stream 1997 id assignment procedure described above). 1999 ConnectionReceived: LISTEN.SCTP returns without an error (a result 2000 of successful CONNECT.SCTP from the peer), or, in case of stream 2001 mapping, the first message has arrived on a new stream (in this 2002 case, Receive is also invoked). 2004 Clone: Calling Clone on an SCTP association creates a new Connection 2005 object and assigns it a new stream id in accordance with the 2006 stream id assignment procedure described above. If there are not 2007 enough streams available, ADD_STREAM.SCTP must be called. 2009 Priority (Connection): When this value is changed, or a Message with 2010 Message Property Priority is sent, and there are multiple 2011 Connection objects assigned to the same SCTP association, 2012 CONFIGURE_STREAM_SCHEDULER.SCTP is called to adjust the priorities 2013 of streams in the SCTP association. 2015 Send: SEND.SCTP. Message Properties such as Lifetime and Ordered 2016 map to parameters of this primitive. 2018 Receive: RECEIVE.SCTP. The "partial flag" of RECEIVE.SCTP invokes a 2019 ReceivedPartial event. 2021 Close: If this is the only Connection object that is assigned to the 2022 SCTP association, CLOSE.SCTP is called, and the Closed event will be 2023 delivered to the application upon the ensuing CLOSE-EVENT.SCTP. 2024 Else, the Connection object is one out of several Connection objects 2025 that are assigned to the same SCTP assocation, and RESET_STREAM.SCTP 2026 must be called, which informs the peer that the stream will no longer 2027 be used for mapping and can be used by future Initiate, 2028 InitiateWithSend or Listen calls. At the peer, the event 2029 RESET_STREAM-EVENT.SCTP will fire, which the peer must answer by 2030 issuing RESET_STREAM.SCTP too. The resulting local RESET_STREAM- 2031 EVENT.SCTP informs the Transport Services system that the stream id 2032 can now be re-used by the next Initiate, InitiateWithSend or Listen 2033 calls, and invokes a Closed event towards the application. 2035 Abort: If this is the only Connection object that is assigned to the 2036 SCTP association, ABORT.SCTP is called. Else, the Connection object 2037 is one out of several Connection objects that are assigned to the 2038 same SCTP assocation, and shutdown proceeds as described under Close. 2040 11. IANA Considerations 2042 RFC-EDITOR: Please remove this section before publication. 2044 This document has no actions for IANA. 2046 12. Security Considerations 2048 [I-D.ietf-taps-arch] outlines general security consideration and 2049 requirements for any system that implements the Transport Services 2050 archtecture. [I-D.ietf-taps-interface] provides further discussion 2051 on security and privacy implications of the Transport Services API. 2052 This document provides additional guidance on implementation 2053 specifics for the Transport Services API and as such the security 2054 considerations in both of these documents apply. The next two 2055 subsections discuss further considerations that are specific to 2056 mechanisms specified in this document. 2058 12.1. Considerations for Candidate Gathering 2060 Implementations should avoid downgrade attacks that allow network 2061 interference to cause the implementation to select less secure, or 2062 entirely insecure, combinations of paths and protocols. 2064 12.2. Considerations for Candidate Racing 2066 See Section 5.3 for security considerations around racing with 0-RTT 2067 data. 2069 An attacker that knows a particular device is racing several options 2070 during connection establishment may be able to block packets for the 2071 first connection attempt, thus inducing the device to fall back to a 2072 secondary attempt. This is a problem if the secondary attempts have 2073 worse security properties that enable further attacks. 2074 Implementations should ensure that all options have equivalent 2075 security properties to avoid incentivizing attacks. 2077 Since results from the network can determine how a connection attempt 2078 tree is built, such as when DNS returns a list of resolved endpoints, 2079 it is possible for the network to cause an implementation to consume 2080 significant on-device resources. Implementations should limit the 2081 maximum amount of state allowed for any given node, including the 2082 number of child nodes, especially when the state is based on results 2083 from the network. 2085 13. Acknowledgements 2087 This work has received funding from the European Union's Horizon 2020 2088 research and innovation programme under grant agreement No. 644334 2089 (NEAT) and No. 815178 (5GENESIS). 2091 This work has been supported by Leibniz Prize project funds of DFG - 2092 German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ 2093 FE 570/4-1). 2095 This work has been supported by the UK Engineering and Physical 2096 Sciences Research Council under grant EP/R04144X/1. 2098 This work has been supported by the Research Council of Norway under 2099 its "Toppforsk" programme through the "OCARINA" project. 2101 Thanks to Colin Perkins, Tom Jones, Karl-Johan Grinnemo, Gorry 2102 Fairhurst, for their contributions to the design of this 2103 specification. Thanks also to Stuart Cheshire, Josh Graessley, David 2104 Schinazi, and Eric Kinnear for their implementation and design 2105 efforts, including Happy Eyeballs, that heavily influenced this work. 2107 14. References 2109 14.1. Normative References 2111 [I-D.ietf-taps-arch] 2112 Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., and 2113 C. Perkins, "An Architecture for Transport Services", Work 2114 in Progress, Internet-Draft, draft-ietf-taps-arch-12, 3 2115 January 2022, . 2118 [I-D.ietf-taps-interface] 2119 Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G., 2120 Kuehlewind, M., Perkins, C., Tiesel, P. S., Wood, C. A., 2121 Pauly, T., and K. Rose, "An Abstract Application Layer 2122 Interface to Transport Services", Work in Progress, 2123 Internet-Draft, draft-ietf-taps-interface-14, 3 January 2124 2022, . 2127 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 2128 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 2129 . 2131 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 2132 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 2133 DOI 10.17487/RFC7540, May 2015, 2134 . 2136 [RFC8303] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of 2137 Transport Features Provided by IETF Transport Protocols", 2138 RFC 8303, DOI 10.17487/RFC8303, February 2018, 2139 . 2141 [RFC8304] Fairhurst, G. and T. Jones, "Transport Features of the 2142 User Datagram Protocol (UDP) and Lightweight UDP (UDP- 2143 Lite)", RFC 8304, DOI 10.17487/RFC8304, February 2018, 2144 . 2146 [RFC8305] Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: 2147 Better Connectivity Using Concurrency", RFC 8305, 2148 DOI 10.17487/RFC8305, December 2017, 2149 . 2151 [RFC8421] Martinsen, P., Reddy, T., and P. Patil, "Guidelines for 2152 Multihomed and IPv4/IPv6 Dual-Stack Interactive 2153 Connectivity Establishment (ICE)", BCP 217, RFC 8421, 2154 DOI 10.17487/RFC8421, July 2018, 2155 . 2157 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 2158 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 2159 . 2161 [RFC8923] Welzl, M. and S. Gjessing, "A Minimal Set of Transport 2162 Services for End Systems", RFC 8923, DOI 10.17487/RFC8923, 2163 October 2020, . 2165 14.2. Informative References 2167 [I-D.ietf-quic-transport] 2168 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 2169 and Secure Transport", Work in Progress, Internet-Draft, 2170 draft-ietf-quic-transport-34, 14 January 2021, 2171 . 2174 [I-D.ietf-tcpm-2140bis] 2175 Touch, J., Welzl, M., and S. Islam, "TCP Control Block 2176 Interdependence", Work in Progress, Internet-Draft, draft- 2177 ietf-tcpm-2140bis-11, 12 April 2021, 2178 . 2181 [NEAT-flow-mapping] 2182 "Transparent Flow Mapping for NEAT", IFIP NETWORKING 2017 2183 Workshop on Future of Internet Transport (FIT 2017) , 2184 2017. 2186 [RFC1928] Leech, M., Ganis, M., Lee, Y., Kuris, R., Koblas, D., and 2187 L. Jones, "SOCKS Protocol Version 5", RFC 1928, 2188 DOI 10.17487/RFC1928, March 1996, 2189 . 2191 [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", 2192 RFC 3124, DOI 10.17487/RFC3124, June 2001, 2193 . 2195 [RFC3207] Hoffman, P., "SMTP Service Extension for Secure SMTP over 2196 Transport Layer Security", RFC 3207, DOI 10.17487/RFC3207, 2197 February 2002, . 2199 [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, 2200 "Session Traversal Utilities for NAT (STUN)", RFC 5389, 2201 DOI 10.17487/RFC5389, October 2008, 2202 . 2204 [RFC5766] Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using 2205 Relays around NAT (TURN): Relay Extensions to Session 2206 Traversal Utilities for NAT (STUN)", RFC 5766, 2207 DOI 10.17487/RFC5766, April 2010, 2208 . 2210 [RFC6525] Stewart, R., Tuexen, M., and P. Lei, "Stream Control 2211 Transmission Protocol (SCTP) Stream Reconfiguration", 2212 RFC 6525, DOI 10.17487/RFC6525, February 2012, 2213 . 2215 [RFC6762] Cheshire, S. and M. Krochmal, "Multicast DNS", RFC 6762, 2216 DOI 10.17487/RFC6762, February 2013, 2217 . 2219 [RFC6763] Cheshire, S. and M. Krochmal, "DNS-Based Service 2220 Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013, 2221 . 2223 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 2224 Protocol (HTTP/1.1): Message Syntax and Routing", 2225 RFC 7230, DOI 10.17487/RFC7230, June 2014, 2226 . 2228 [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services 2229 (Diffserv) and Real-Time Communication", RFC 7657, 2230 DOI 10.17487/RFC7657, November 2015, 2231 . 2233 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 2234 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 2235 March 2017, . 2237 [RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, 2238 "Stream Schedulers and User Message Interleaving for the 2239 Stream Control Transmission Protocol", RFC 8260, 2240 DOI 10.17487/RFC8260, November 2017, 2241 . 2243 [RFC8445] Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive 2244 Connectivity Establishment (ICE): A Protocol for Network 2245 Address Translator (NAT) Traversal", RFC 8445, 2246 DOI 10.17487/RFC8445, July 2018, 2247 . 2249 [TCP-COUPLING] 2250 "ctrlTCP: Reducing Latency through Coupled, Heterogeneous 2251 Multi-Flow TCP Congestion Control", IEEE INFOCOM Global 2252 Internet Symposium (GI) workshop (GI 2018) , n.d.. 2254 Appendix A. API Mapping Template 2256 Any protocol mapping for the Transport Services API should follow a 2257 common template. 2259 Connectedness: (Connectionless/Connected/Multiplexing Connected) 2261 Data Unit: (Byte-stream/Datagram/Message) 2263 Connection Object: 2265 Initiate: 2267 InitiateWithSend: 2269 Ready: 2271 InitiateError: 2273 ConnectionError: 2275 Listen: 2277 ConnectionReceived: 2279 Clone: 2281 Send: 2283 Receive: 2285 Close: 2287 Abort: 2289 Appendix B. Additional Properties 2291 This appendix discusses implementation considerations for additional 2292 parameters and properties that could be used to enhance transport 2293 protocol and/or path selection, or the transmission of messages given 2294 a Protocol Stack that implements them. These are not part of the 2295 interface, and may be removed from the final document, but are 2296 presented here to support discussion within the TAPS working group as 2297 to whether they should be added to a future revision of the base 2298 specification. 2300 B.1. Properties Affecting Sorting of Branches 2302 In addition to the Protocol and Path Selection Properties discussed 2303 in Section 4.1.3, the following properties under discussion can 2304 influence branch sorting: 2306 * Bounds on Send or Receive Rate: If the application indicates a 2307 bound on the expected Send or Receive bitrate, an implementation 2308 may prefer a path that can likely provide the desired bandwidth, 2309 based on cached maximum throughput, see Section 9.2. The 2310 application may know the Send or Receive Bitrate from metadata in 2311 adaptive HTTP streaming, such as MPEG-DASH. 2313 * Cost Preferences: If the application indicates a preference to 2314 avoid expensive paths, and some paths are associated with a 2315 monetary cost, an implementation should decrease the ranking of 2316 such paths. If the application indicates that it prohibits using 2317 expensive paths, paths that are associated with a cost should be 2318 purged from the decision tree. 2320 Appendix C. Reasons for errors 2322 The Transport Services API [I-D.ietf-taps-interface] allows for the 2323 several generic error types to specify a more detailed reason as to 2324 why an error occurred. This appendix lists some of the possible 2325 reasons. 2327 * InvalidConfiguration: The transport properties and endpoints 2328 provided by the application are either contradictory or 2329 incomplete. Examples include the lack of a Remote Endpoint on an 2330 active open or using a multicast group address while not 2331 requesting a unidirectional receive. 2333 * NoCandidates: The configuration is valid, but none of the 2334 available transport protocols can satisfy the transport properties 2335 provided by the application. 2337 * ResolutionFailed: The remote or local specifier provided by the 2338 application can not be resolved. 2340 * EstablishmentFailed: The Transport Services system was unable to 2341 establish a transport-layer connection to the Remote Endpoint 2342 specified by the application. 2344 * PolicyProhibited: The system policy prevents the transport system 2345 from performing the action requested by the application. 2347 * NotCloneable: The protocol stack is not capable of being cloned. 2349 * MessageTooLarge: The message size is too big for the transport 2350 system to handle. 2352 * ProtocolFailed: The underlying protocol stack failed. 2354 * InvalidMessageProperties: The message properties are either 2355 contradictory to the transport properties or they can not be 2356 satisfied by the transport system. 2358 * DeframingFailed: The data that was received by the underlying 2359 protocol stack could not be deframed. 2361 * ConnectionAborted: The connection was aborted by the peer. 2363 * Timeout: Delivery of a message was not possible after a timeout. 2365 Appendix D. Existing Implementations 2367 This appendix gives an overview of existing implementations, at the 2368 time of writing, of transport systems that are (to some degree) in 2369 line with this document. 2371 * Apple's Network.framework: 2373 - Network.framework is a transport-level API built for C, 2374 Objective-C, and Swift. It a connect-by-name API that supports 2375 transport security protocols. It provides userspace 2376 implementations of TCP, UDP, TLS, DTLS, proxy protocols, and 2377 allows extension via custom framers. 2379 - Documentation: https://developer.apple.com/documentation/ 2380 network (https://developer.apple.com/documentation/network) 2382 * NEAT and NEATPy: 2384 - NEAT is the output of the European H2020 research project 2385 "NEAT"; it is a user-space library for protocol-independent 2386 communication on top of TCP, UDP and SCTP, with many more 2387 features such as a policy manager. 2389 - Code: https://github.com/NEAT-project/neat (https://github.com/ 2390 NEAT-project/neat) 2392 - NEAT project: https://www.neat-project.org (https://www.neat- 2393 project.org) 2395 - NEATPy is a Python shim over NEAT which updates the NEAT API to 2396 be in line with version 6 of the Transport Services API draft. 2398 - Code: https://github.com/theagilepadawan/NEATPy 2399 (https://github.com/theagilepadawan/NEATPy) 2401 * PyTAPS: 2403 - A TAPS implementation based on Python asyncio, offering 2404 protocol-independent communication to applications on top of 2405 TCP, UDP and TLS, with support for multicast. 2407 - Code: https://github.com/fg-inet/python-asyncio-taps 2408 (https://github.com/fg-inet/python-asyncio-taps) 2410 Authors' Addresses 2411 Anna Brunstrom (editor) 2412 Karlstad University 2413 Universitetsgatan 2 2414 651 88 Karlstad 2415 Sweden 2416 Email: anna.brunstrom@kau.se 2418 Tommy Pauly (editor) 2419 Apple Inc. 2420 One Apple Park Way 2421 Cupertino, California 95014, 2422 United States of America 2423 Email: tpauly@apple.com 2425 Theresa Enghardt 2426 Netflix 2427 121 Albright Way 2428 Los Gatos, CA 95032, 2429 United States of America 2430 Email: ietf@tenghardt.net 2432 Philipp S. Tiesel 2433 SAP SE 2434 Konrad-Zuse-Ring 10 2435 14469 Potsdam 2436 Germany 2437 Email: philipp@tiesel.net 2439 Michael Welzl 2440 University of Oslo 2441 PO Box 1080 Blindern 2442 0316 Oslo 2443 Norway 2444 Email: michawe@ifi.uio.no